Академический Документы
Профессиональный Документы
Культура Документы
Regular Expressions stand on the language automaton and formal languages. The idea of Regular
Expression (RE) similar to the idea of Knuth Morris Pratt (KMP) Algorithm, but the goal of Regular
Expressions is try to find such a given pattern in a string while KMP only capable to find such
substring. To do that, we need more abstract machine to capture expressive pattern that support
necessary logic such as ‘or’ and repetition. We can construct such abstract machine by implementing
Nondeterministic Finite Automaton (NFA) on Digraph structure. A term nondeterministic refers that
digraph might contains multiple path contrast to the construction of DFA which only has single path.
Similar to arithmetic symbol in mathematic, we can define such similar pattern in the languages as
syntax. Table below give four patterns with examples:
The basic NFA construction consist a digraph G and epsilon-transition (transition in edge without
scanning text). Below the description of basic rule to construct NFA for RE:
• Concatenation is every match transition.
• Use stack to handle Parentheses: Push each left parenthesis on the stack and pop each time
encountered with right parenthesis.
• Handle Closure in two condition: After single character or after right parenthesis.
• Handle Or expression by implementing two epsilon-transition.
pc = new Bag<Integer>();
dfs = new DirectedDFS(G, match);
return false;
}
}
The critical transition of constructing NFA is when we dealing with right parenthesis. Similar to
Djikstra two-stack, we need to store every meta character “*, |, (” into stack. When hit right parenthesis
character ‘)’, we popped out item in the stack, but if popped out character is ‘|’, we need extra handling
to do or operation which is has two epsilon-transition.
The total running time of our implementation of Regular Expression using NFA would be MN which is
M time for constructing NFA and N time for doing scanning over text.