Page 1 of 1

Questions about the KMP search algorithm

Posted: Mon Nov 14, 2022 1:47 pm
by KalamyQ

I'm having trouble comprehending the KMP algorithm. I understand prefix-suffix and have written code to calculate the prefix-suffix table:

Code: Select all

 private int[] calculatePrefSuffArray(String pattern) {
    char patternArray[] = pattern.toCharArray();
    int tab[] = new int[pattern.length()];
    tab[0] = 0;
    int t = tab[0];
    for (int i = 1; i < pattern.length(); i++) { // t = tab[i-1]
        while(t >0 && patternArray[i] != patternArray[t] ) {
            t = tab[t-1];
        }
        if(patternArray[i] == patternArray[t]) t++;
        tab[i] = t;
    }
    return tab;
}

This is the algorithm outlined in the scaler topics. But I'm not sure how to utilize it in KMP. Could you please explain it to me?


Re: KMP algorithm

Posted: Tue Nov 15, 2022 5:34 pm
by Burunduk

Is it java? I don't understand java. Rewritten in python, it seems to work properly. So, you know how to get the lps table but don't know what to do with it next, do you?

The site you've linked to has a detailed explanation already. Basically all the fuss is about partial matches. For random strings the simple search can be quite good but for long series of identical characters the KMP algorithm is more efficient.

Let's rewrite the code from geeksforgeeks.org to see the difference:

Code: Select all

import sys

def lps(s):
  tab = [0] * len(s)
  t = tab[0]
  for i in range(1, len(s)):
    while t > 0 and s[i] != s[t]:
      t = tab[t-1]
    if s[i] == s[t]: t = t + 1
    tab[i] = t
  return tab

def simple_search(pattern, string):
  patlen = len(pattern)
  strlen = len(string)
  
  i = j = 0
  
  cnt = 0
  while i < strlen:
    cnt += 1
    if pattern[j] == string[i]:
      i += 1
      j += 1
      if j == patlen:
        print("pattern found at %i" % (i - j))
      else: continue
    i = i - j + 1 # may need to backtrack i here
    j = 0
  print("Simple: %i iteration(s)" % cnt)

def kmp_search(pattern, string):
  patlen = len(pattern)
  strlen = len(string)
  
  tab = lps(pattern)
  
  i = j = 0
  
  cnt = 0
  while i < strlen:
    cnt += 1
    if pattern[j] == string[i]:
      i += 1
      j += 1
      if j == patlen:
        print("pattern found at %i" % (i - j))
      else: continue
    if j != 0:
      j = tab[j-1] # use the table for j, don't backtrack i
    else:
      i += 1
  print("KMP:    %i iteration(s)" % cnt)

word = sys.argv[1]
text = sys.argv[2]
simple_search(word, text)
kmp_search(word, text)

The tests show that the KMP search needs fewer iterations to complete:

Code: Select all

root# python lps.py puppy "the puppy linux"
pattern found at 4
Simple: 21 iteration(s)
pattern found at 4
KMP:    15 iteration(s)
root# python lps.py aaaaa aaaaabaaaaaa
pattern found at 0
pattern found at 6
pattern found at 7
Simple: 34 iteration(s)
pattern found at 0
pattern found at 6
pattern found at 7
KMP:    16 iteration(s)
root# python lps.py AABA AABAACAADAABAABA
pattern found at 0
pattern found at 9
pattern found at 12
Simple: 34 iteration(s)
pattern found at 0
pattern found at 9
pattern found at 12
KMP:    20 iteration(s)

Re: KMP algorithm

Posted: Wed Nov 23, 2022 2:01 pm
by urin123

Frankly, you need to understand the basic concept before jumping into the code details. Hope you have enough resources to go through the topic. It is an advanced algorithm. If you still have doubt, you can go through https://www.w3spot.com/2020/07/kmp-algo ... glish.html. It contains examples with identical characters & non-identical characters both. Once you grasp the concept, you can go to GeeksForGeeks & check the code implementation.