Not difficult to understand. I learned it from Coursera.
Prefix Function A “Border” of a string, is a substring that both acts as a prefix and a suffix of that string The “Prefix Function” of a string records the length of the longest border of every prefix of that string Symbol s is a string s[:i] is a prefix of s made up by s[0], s[1], ..., s[i-1] L[i] is the length of the longest border of s[:i] In other words, L is the “Prefix Function” of s Conclusion L[i+1] <= L[i]+1, and the equality holds if and only if s[L] = s[i] The longest border of the longest border of a string, is the second longest border of that string. if some border of s[:i] has a length of len and s[len] = s[i], then there must be a border in s[:i+1] with a length of len+1 Method L[0] = 0 L[i+1] = len + 1, where len is the length of the longest border of s[:i] that satisfies s[len] = s[i] To find len, try every border of s[:i] from the longest to the shortest To find the next longest border, see Conclusion 2 If no border is found, simply compare s[0] with s[i] If s[0] = s[i], then L[i+1] = 1 Else, L[i+1] = 0 Implementation int *pf(string s) { int len=s.size(),b=0; int *p=new int[len]; p[0]=0; for(int i=1;i<len;i++) { while(b>0 && s[i]!=s[b]) b=p[b-1]; if(s[i]==s[b]) b++; p[i] = b; } return p; } KMP Symbol p is the pattern string, t is the text string L is the “Prefix Function” of p Conclusion If p matches t partially with a longest common prefix p[:i], then no matches will be found before p is shifted len(p) - L[i] positions rightwards. There are two “submatches” in this partial match, both of which are the longest border of p[:i] The left one acts as a prefix, while the right one acts as a postfix p is shifted rightwards, so does p[:i] No matches are possible before the left submatch arrives at the previous position of the right submatch The length of this “vacuum area” is len(p) - L[i] Method Make s = p + '$' + t, where ‘$’ is absent from both p and t Calculate the “Prefix Function” of s, marked as L A match is found at position i - 2 * len(p) when L[i] = len(p) Under such circumstances, the longest border of s[:i] happens to be p itself, as p is a prefix of s This border is also a substring of s Thanks to the ‘$’ sign, we can guarantee that this border never crosses the real “border” between p and t, so it is also a substring of t Thus the starting position of the match is i - (len(p) - 1) in s, and i - (len(p) - 1) - (len(p) + 1) in t Implementation int main() { string pattern, text, s; int *pf; while(cin >> pattern >> text && pattern != "") { if(pattern.size() > text.size()) { cout << "-1" << endl; continue; } s = pattern + "$" + text; pf = prefix_func(s); for(int i = pattern.size()+1; i < s.size(); i++) { if(pf[i] == pattern.size()) { cout << i - 2 * pattern.size() << " "; } } cout << endl; delete pf; } return 0; }