Page 1 of 1
Questions about the KMP search algorithm
Posted: Mon Nov 14, 2022 1:47 pm
by KalamyQ
I'm having trouble comprehending the KMP algorithm. I understand prefix-suffix and have written code to calculate the prefix-suffix table:
Code: Select all
private int[] calculatePrefSuffArray(String pattern) {
char patternArray[] = pattern.toCharArray();
int tab[] = new int[pattern.length()];
tab[0] = 0;
int t = tab[0];
for (int i = 1; i < pattern.length(); i++) { // t = tab[i-1]
while(t >0 && patternArray[i] != patternArray[t] ) {
t = tab[t-1];
}
if(patternArray[i] == patternArray[t]) t++;
tab[i] = t;
}
return tab;
}
This is the algorithm outlined in the scaler topics. But I'm not sure how to utilize it in KMP. Could you please explain it to me?
Re: KMP algorithm
Posted: Tue Nov 15, 2022 5:34 pm
by Burunduk
Is it java? I don't understand java. Rewritten in python, it seems to work properly. So, you know how to get the lps table but don't know what to do with it next, do you?
The site you've linked to has a detailed explanation already. Basically all the fuss is about partial matches. For random strings the simple search can be quite good but for long series of identical characters the KMP algorithm is more efficient.
Let's rewrite the code from geeksforgeeks.org to see the difference:
Code: Select all
import sys
def lps(s):
tab = [0] * len(s)
t = tab[0]
for i in range(1, len(s)):
while t > 0 and s[i] != s[t]:
t = tab[t-1]
if s[i] == s[t]: t = t + 1
tab[i] = t
return tab
def simple_search(pattern, string):
patlen = len(pattern)
strlen = len(string)
i = j = 0
cnt = 0
while i < strlen:
cnt += 1
if pattern[j] == string[i]:
i += 1
j += 1
if j == patlen:
print("pattern found at %i" % (i - j))
else: continue
i = i - j + 1 # may need to backtrack i here
j = 0
print("Simple: %i iteration(s)" % cnt)
def kmp_search(pattern, string):
patlen = len(pattern)
strlen = len(string)
tab = lps(pattern)
i = j = 0
cnt = 0
while i < strlen:
cnt += 1
if pattern[j] == string[i]:
i += 1
j += 1
if j == patlen:
print("pattern found at %i" % (i - j))
else: continue
if j != 0:
j = tab[j-1] # use the table for j, don't backtrack i
else:
i += 1
print("KMP: %i iteration(s)" % cnt)
word = sys.argv[1]
text = sys.argv[2]
simple_search(word, text)
kmp_search(word, text)
The tests show that the KMP search needs fewer iterations to complete:
Code: Select all
root# python lps.py puppy "the puppy linux"
pattern found at 4
Simple: 21 iteration(s)
pattern found at 4
KMP: 15 iteration(s)
root# python lps.py aaaaa aaaaabaaaaaa
pattern found at 0
pattern found at 6
pattern found at 7
Simple: 34 iteration(s)
pattern found at 0
pattern found at 6
pattern found at 7
KMP: 16 iteration(s)
root# python lps.py AABA AABAACAADAABAABA
pattern found at 0
pattern found at 9
pattern found at 12
Simple: 34 iteration(s)
pattern found at 0
pattern found at 9
pattern found at 12
KMP: 20 iteration(s)
Re: KMP algorithm
Posted: Wed Nov 23, 2022 2:01 pm
by urin123
Frankly, you need to understand the basic concept before jumping into the code details. Hope you have enough resources to go through the topic. It is an advanced algorithm. If you still have doubt, you can go through https://www.w3spot.com/2020/07/kmp-algo ... glish.html. It contains examples with identical characters & non-identical characters both. Once you grasp the concept, you can go to GeeksForGeeks & check the code implementation.