Вы находитесь на странице: 1из 2

Knuth-Morris-Pratt (KMP) Algorithm for Pattern Matching

Creating the next[j] table


In order to find a pattern in a string, using the KMP algorithm, we need to do some preprocessing first, by
building something called a next[j] table.
If we are given the pattern; 11011100, we need to construct the next[j] table for this pattern by comparing
the pattern with itself. This is down by writing down the same pattern, one below the other, and shifting the
pattern at the bottom to the right by one digit.
11011100
11011100

[top pattern]
[bottom pattern]

Comparing the above two patterns is done systematically. We first take 1 bit of the top pattern and bottom
pattern into consideration, then 2 bits of the top and bottom patterns, and so on.
If we consider only one bit in each the patterns for the comparison, we dont need most of the bits shown
above and can rewrite the above as:
1xxxxxxx
1xxxxxxx

[top pattern]
[bottom pattern]

Now with only the first bits in both patterns in mind (i.e. forget about all the xs) , we look for an overlap
between the top pattern and the bottom pattern. In the above case, there is no such overlap. Therefore, we
say that when j = 1, next[j] = 0. By j = 1, we mean considering the first bit of each pattern. By next[j] = 0
we mean 0 bits matched.
Weve so far compared the first bit of the pattern with itself. Now we compare the first two bits of the
pattern with itself (i.e. for j = 2) as follows:
11xxxxxx
11xxxxxx

[top pattern]
[bottom pattern]

We look for an overlap between the relevant bits of the above patterns (i.e. the bits that are not xs). We
see that the first bit of the bottom pattern overlaps with the second bit of the top pattern, and only this bit is
similar in the overlapping portions of both patterns. So, when j = 2, next[j] = 1 (because only 1 bit
matched).
Similarly, we check for j = 3;
110xxxxx
110xxxxx
110xxxxx

[top pattern]
[centre pattern]
[bottom pattern]

Here we consider the first three bits of the patterns. There are three rows now because we need to make the
first bit in the last row (i.e. the bottom pattern) coincide with bit number j (i.e 3) in the top pattern.
So now we look at the overlapping bits of the top pattern with each of the other patterns. We can see that
the second and third bits from the top pattern (i.e. the 1 0) overlaps with the first two bits of the centre
pattern (i.e. the 1 1). But for a match, ALL these bits have to match, therefore 1 0 does not match with
1 1. Next we check the bottom pattern with the top pattern. Here again we see that the third bit from the

top pattern (i.e. 0) overlaps with the first bit of the bottom pattern (i.e. 1). Again this is not a match. So
when j = 3, next[j] = 0 (its 0 because we couldnt find a match).
Similarly, for j = 4, we compare the first four bits of all the patterns.
1101xxxx
1101xxxx
1101xxxx
1101xxxx

[top pattern]
[second pattern]
[third pattern]
[bottom pattern]

Firstly, check the overlapping bits of the top pattern with the second pattern. We see that 1 0 1 of the top
pattern overlaps with 1 1 0 of the second pattern. But although the first bits match, we do not consider it a
proper match unless ALL the three bits match. We move on to compare the top pattern with the third
pattern. We see that 0 1 of the top pattern overlaps with 1 1 of the third pattern. Still no match, so we
consider the bottom pattern and the top pattern. Here, the last bit 1 of the top pattern matches with the
first bit 1 of the bottom pattern. Since there is 1 matching bit, for j = 4, next[j] = 1.
Again, for j = 5, we have:
11011xxx
11011xxx
11011xxx
11011xxx
11011xxx

[top pattern]
[second pattern]
[third pattern]
[fourth pattern]
[bottom pattern]

Compare top and second; 1 0 1 1 and 1 1 0 1 is not a match. Compare top and third; 0 1 1 and 1 1 0
still no match. Compare top and fourth; 1 1 and 1 1, alrighty then weve found a match of 2 bits. Now if
we continue and we compare the top and bottom patterns, we will find a match of 1 bit. But since we found
a match of 2 bits between the top and fourth patterns, we can ignore the match of 1 bit, between the top and
bottom patterns. So for j = 5, next[j] = 2.
Similarly do for the next 2 steps (i.e. upto j = 7, not j = 8). We can create our next[j] table based on the
above results to get:
j
0
1
2
3
4
5
6
7

next[j]
-1
0
1
0
1
2
2
3

For j = 0, next[j] is always 1 (see lect notes p 288 for an explanation).