Computational Linguistics II: Parsing Tomita's Parser: Laura Kallmeyer University of T Ubingen Winter Term 2006/2007

Kallmeyer CL II: Parsing Kallmeyer CL II: Parsing
Motivation
LR-parsing with one lookahead is deterministic for LR(1)
grammars. But there are CFLs that cannot be generated by
LR(1)-grammars.
Computational Linguistics II: Parsing
If a grammar is not LR(1), we can construct a LR(1) parse
Tomita’s Parser table with more than one entry in some of the fields. This can
be used for non-deterministic parsing.
Laura Kallmeyer However, since we don’t have tabulation, partial results get
University of Tübingen computed several times and the complexity is exponential.
Winter term 2006/2007 Tomita’s idea: Use a graph-structured stack to avoid
computing partial results more than once.
Tomita’s Parser 1 4 February 2008 Tomita’s Parser 3 4 February 2008
Graph-structured stack (1)

The stack is a directed acyclic graph (dag) with the leaves being
the topmost elements.
For every word in the input, before processing that word, we have k
Overview possible states.
1. Motivation We first do the possible reductions for each of the states while
leaving the original stack if there is a shift possible. In case of a
2. Graph-structured stack
reduce/reduce or shift/reduce conflict, we branch. If several
3. The parse forest branches lead to the same states, we identify these.
4. Conclusion We repeat this until no more reductions are possible.
We then do the possible shifts. Again, if several lead to the
same states, we identify these.

Graph-structured stack (2) The parse forest (1)

Example: 1. S → AB, 2. S → SC, 3. B → BC, The dag-structure avoids an explosion in the number of stacks.
4. A → a, 5. B → b, 6. C → c However, we can still have exponentially many parse trees for a
a b c $ A B C S given input.
0 s4 1 2 Therefore, a compact representation of parse forests is needed.
1 s5 3
Tomita uses two techniques: sub-tree sharing and local
2 s6 acc 7 ambiguity packing.
Parse 3 r1, s6 r1 8
table: 4 r4
5 r5 r5
6 r6 r6
7 r2 r2
8 r3 r3
Graph-structured stack (3) The parse forest (2)

For input w = abcc, at some point (after shifting the first c) the Example: Take the preceding grammar, w = abcc
stack is the following: Three parse trees:
S 2 S S S
0 c 6 A B S C S C
A 1 B 3 a B C A B c S C c
B C c a B C A B c
b c b c a b
Sub-tree sharing: Common sub-trees are represented only once.

The parse forest (3) The parse forest (5)

Result of sub-tree sharing: Packed parse forests are easy to construct within an LR-parser
with graph-structured stack: Whenever a subtree is shared or
S S S
different subtrees ar packed into one node, there will be a
S S B corresponding shared node in the stack graph.
S B Example: w = abc.
A B C C Stack analysis
0 s4
a b c c
0 1 4 r4 1: a
Local ambiguity packing: whenever the same category spans the 0 2 1 s5 2: A( 1 )
same input (possibly with different analyses), the corrponding
0 2 1 3 5 r5 3: b
nodes are put into one packed node.
0 2 1 4 3 r1, s6 4: B( 3 )
The parse forest (4) The parse forest (6)

Result of local ambiguity packing: 0 2 1 4 3 s6
S S 5 2 s6 5: S( 2 , 4 )
0 2 1 4 3 6 6 r6
S S B
5 2 6: c
S B
0 2 1 4 3 7 8 r3
A B C C 5 2 7 7 r2 7: C( 6 )
a b c c 0 2 1 8 3 r1 8 :B( 4 , 7 )
9 2 acc 9: S( 5 , 7 )
0 11 2 acc 10 :S( 2 , 8 ), 11 :[ 10 , 9 ]

Kallmeyer CL II: Parsing
Conclusion
Tomita’s algorithm
is a general LR(1) parser that works for every CFG;
uses a graph-structured stack to avoid the explosion otherwise
linked to non-determinism;
uses a compact parse forest representation to avoid the
explosion arising from ambiguous grammars.
Tomita’s Parser 13 4 February 2008

Computational Linguistics II: Parsing Tomita's Parser: Laura Kallmeyer University of T Ubingen Winter Term 2006/2007

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Computational Linguistics II: Parsing Tomita's Parser: Laura Kallmeyer University of T Ubingen Winter Term 2006/2007

Загружено:

Авторское право:

Доступные форматы

Kallmeyer CL II: Parsing Kallmeyer CL II: Parsing

Tomita’s Parser 1 4 February 2008 Tomita’s Parser 3 4 February 2008

Kallmeyer CL II: Parsing Kallmeyer CL II: Parsing

Graph-structured stack (1)

Tomita’s Parser 2 4 February 2008 Tomita’s Parser 4 4 February 2008

Graph-structured stack (2) The parse forest (1)

Tomita’s Parser 5 4 February 2008 Tomita’s Parser 7 4 February 2008

Kallmeyer CL II: Parsing Kallmeyer CL II: Parsing

Graph-structured stack (3) The parse forest (2)

Tomita’s Parser 6 4 February 2008 Tomita’s Parser 8 4 February 2008

The parse forest (3) The parse forest (5)

Tomita’s Parser 9 4 February 2008 Tomita’s Parser 11 4 February 2008

Kallmeyer CL II: Parsing Kallmeyer CL II: Parsing

The parse forest (4) The parse forest (6)

Tomita’s Parser 10 4 February 2008 Tomita’s Parser 12 4 February 2008

Tomita’s Parser 13 4 February 2008

Вам также может понравиться