Dai Chao

fordring866@gmail.com | Put your faith in the Light!

The KMP Algorithm

Not difficult to understand. I learned it from Coursera.

Prefix Function

Symbol

Conclusion

  1. L[i+1] <= L[i]+1, and the equality holds if and only if s[L] = s[i]
  2. The longest border of the longest border of a string, is the second longest border of that string.
  3. if some border of s[:i] has a length of len and s[len] = s[i], then there must be a border in s[:i+1] with a length of len+1

Method

  1. L[0] = 0
  2. L[i+1] = len + 1, where len is the length of the longest border of s[:i] that satisfies s[len] = s[i]
    • To find len, try every border of s[:i] from the longest to the shortest
    • To find the next longest border, see Conclusion 2
    • If no border is found, simply compare s[0] with s[i]
    • If s[0] = s[i], then L[i+1] = 1
    • Else, L[i+1] = 0

Implementation

int *pf(string s) {
	int len=s.size(),b=0;
	int *p=new int[len]; p[0]=0;
	for(int i=1;i<len;i++) {
		while(b>0 && s[i]!=s[b]) b=p[b-1];
		if(s[i]==s[b]) b++;
		p[i] = b;
	}
	return p;
}

KMP

Symbol

Conclusion

Method

  1. Make s = p + '$' + t, where ‘$’ is absent from both p and t
  2. Calculate the “Prefix Function” of s, marked as L
  3. A match is found at position i - 2 * len(p) when L[i] = len(p)
    • Under such circumstances, the longest border of s[:i] happens to be p itself, as p is a prefix of s
    • This border is also a substring of s
    • Thanks to the ‘$’ sign, we can guarantee that this border never crosses the real “border” between p and t, so it is also a substring of t
    • Thus the starting position of the match is i - (len(p) - 1) in s, and i - (len(p) - 1) - (len(p) + 1) in t

Implementation

int main() {
	string pattern, text, s;
	int *pf;
	while(cin >> pattern >> text && pattern != "") {
		if(pattern.size() > text.size()) {
			cout << "-1" << endl;
			continue;
		}
		s = pattern + "$" + text;
		pf = prefix_func(s);
		for(int i = pattern.size()+1; i < s.size(); i++) {
			if(pf[i] == pattern.size()) {
				cout << i - 2 * pattern.size() << " ";
			}
		}
		cout << endl;
		delete pf;
	}
	return 0;
}