Вы находитесь на странице: 1из 57

Lecture 5

• Parsing (Chapter 13 of J&M)


 Skip 13.4.3 (chart parsing)
 Read 13.5.2 and 13.5.3 yourselves

1
3/6/2019
Parsing

• Both are dynamic programming solutions


that run in O(n**3) time.
 CKY is bottom-up (going from words to S)
 Earley is top-down (going from S to words)
• Don’t forget alternative views in Nugues!
• Computer languages are typically parsed
using another algm called shift-reduce
(bottom-up parser)
2
3/6/2019
Sample Grammar

3
3/6/2019
Dynamic Programming

• DP methods fill tables with partial results


and
 Do not do too much avoidable repeated work
 Solve exponential problems in polynomial
time (sort of)
 Efficiently store ambiguous structures with
shared sub-parts.

4
3/6/2019
CKY Parsing

• First we’ll limit our grammar to epsilon-


free, binary rules (more later)
• Consider the rule A -> BC
 If there is an A in the input then there must
be a B followed by a C in the input.
 If the A spans from i to j in the input then
there must be some k st. i<k<j
 Ie. The B splits from the C someplace.

5
3/6/2019
Note on index convention

• 0 to n
• Think of them as the inter-word space
 0 = before 1st word
 1 = before 2nd word (or AFTER 1st word)
 n= after nth word
• 0 The 1 book 2 is 3 on 4 the 5 table 6
• A rule that spans 0 to 6 will span the
sentence
6
3/6/2019
CKY

• So let’s build a table so that an A spanning


from i to j in the input is placed in cell [i,j] in
the table.
• So a non-terminal spanning the entire
string will sit in cell [0, n]
• If we build the table bottom up we’ll know
that the parts of the A must go from i to k
and from k to j

7
3/6/2019
CKY

• Meaning that for a rule like A -> B C we


should look for a B in [i,k] and a C in [k,j].
• In other words, if we think there might be
an A spanning i,j in the input… AND
• A -> B C is a rule in the grammar THEN
• There must be a B in [i,k] and a C in [k,j]
for some i<k<j

8
3/6/2019
CKY

• So to fill the table loop over the cell[i,j]


values in some systematic way
 What constraint should we put on that?
 For each cell loop over the appropriate k values to
search for things to add.
• Note the binary constraint: each rule has
exactly 2 non-terminals in RHS

9
3/6/2019
CKY Table

10
3/6/2019
CKY Algorithm

Iterate over columns

Iterate over rows (from bottom up)


Iterate over all possible splits

11
3/6/2019
Note

• We arranged the loops to fill the table a


column at a time, from left to right, bottom
to top.
 This assures us that whenever we’re filling a
cell, the parts needed to fill it are already in
the table (to the left and below)

12
3/6/2019
Example

13
3/6/2019
CKY Parsing

• Is that really a parser?


• Recognize != parse
• Need to keep track of backpointers

14
3/6/2019
Other Ways to Do It?

• Are there any other sensible ways to fill


the table that still guarantee that the cells
we need are already filled?

15
3/6/2019
Other Ways to Do It?

16
3/6/2019
Sample Grammar

17
3/6/2019
Problem

• What if your grammar isn’t binary?


 As in the case of the TreeBank grammar?
• Convert it to binary… any arbitrary CFG can
be rewritten into Chomsky-Normal Form
automatically.
• What does this mean?
 The resulting grammar accepts (and rejects) the
same set of strings as the original grammar.
 But the resulting derivations (trees) are different.

18
3/6/2019
Problem

• More specifically, rules have to be of the


form
A -> B C
Or
A -> w

That is rules can expand to either 2 non-


terminals or to a single terminal.

19
3/6/2019
Binarization Intuition

• Eliminate chains of unit productions.


• Introduce new intermediate non-terminals into the
grammar that distribute rules with length > 2 over
several rules. So…
S -> A B C
 Turns into
S -> X C
X-AB

Where X is a symbol that doesn’t occur anywhere else in the the


grammar.

20
3/6/2019
CNF Conversion

21
3/6/2019
CKY Algorithm

22
3/6/2019
Example

Filling column 5

23
3/6/2019
Example

24
3/6/2019
Example

25
3/6/2019
Example

26
3/6/2019
Example

27
3/6/2019
CKY Notes

• Since it’s bottom up, CKY populates the table


with a lot of phantom constituents.
 Segments that by themselves are constituents but
cannot really occur in the context in which they are
being suggested.
 To avoid this we can switch to a top-down control
strategy or
 We can add some kind of filtering that blocks
constituents where they can not happen in a final
analysis.
28
3/6/2019
Earley Parsing

• Allows arbitrary CFGs


• Top-down control
• Fills a table in a single sweep over the input
words
 Table is length N+1; N is number of words
 Table entries represent
 Completed constituents and their locations
 In-progress constituents
 Predicted constituents

29
3/6/2019
States

• The table-entries are called states and are


represented with dotted-rules.
S -> · VP A VP is predicted
NP -> Det · Nominal An NP is in progress
VP -> V NP · A VP has been found

30
3/6/2019
States/Locations

• S ->  VP [0,0] • A VP is predicted at the start


of the sentence

• NP -> Det  Nominal • An NP is in progress; the Det


goes from 1 to 2
[1,2]

• A VP has been found


starting at 0 and ending at 3
• VP -> V NP  [0,3]

31
3/6/2019
Graphically

32
3/6/2019
Earley

• As with most dynamic programming


approaches, the answer is found by
looking in the table in the right place.
• In this case, there should be an S state in
the final column that spans from 0 to n and
is complete.
• If that’s the case you’re done.
 S –> α  [0,n]
33
3/6/2019
Earley

• So sweep through the table from 0 to n…


 New predicted states are created by starting
top-down from S
 New incomplete states are created by
advancing existing states as new constituents
are discovered
 New complete states are created in the same
way.

34
3/6/2019
Earley

• More specifically…
1. Predict all the states you can upfront
2. Read a word
1. Extend states based on matches
2. Generate new predictions
3. Go to step 2
3. Look at n to see if you have a winner

35
3/6/2019
Earley Code

36
3/6/2019
Earley Code

37
3/6/2019
Example

• Book that flight


• We should find… an S from 0 to 3 that is a
completed state…

38
3/6/2019
Example

39
3/6/2019
Add To Chart

40
3/6/2019
Example

41
3/6/2019
Example

42
3/6/2019
Efficiency

• For such a simple example, there seems


to be a lot of useless stuff in there.
• Why?
• It’s predicting things that aren’t consistent
with the input
•That’s the flipside to the CKY problem.

43
3/6/2019
Details

• As with CKY that isn’t a parser until we


add the backpointers so that each state
knows where it came from.

44
3/6/2019
Back to Ambiguity

• Did we solve it?

45
3/6/2019
Ambiguity

46
3/6/2019
Ambiguity

• No…
 Both CKY and Earley will result in multiple S
structures for the [0,n] table entry.
 They both efficiently store the sub-parts that
are shared between multiple parses.
 And they obviously avoid re-deriving those
sub-parts.
 But neither can tell us which one is right.

47
3/6/2019
Ambiguity

• In most cases, humans don’t notice


incidental ambiguity (lexical or syntactic).
It is resolved on the fly and never noticed.
• We’ll try to model that with probabilities.
• But note something odd and important
about the Groucho Marx example…

48
3/6/2019
Full Syntactic Parsing

• Probably necessary for deep semantic analysis


of texts (as we’ll see).
• Probably not practical for many applications
(given typical resources)
 O(n^3) for straight parsing
 O(n^5) for probabilistic versions
 Too slow for applications that need to process texts in
real time (search engines)
 Or that need to deal with large volumes of new
material over short periods of time
49
3/6/2019
Partial Parsing

• For many applications you don’t really


need a full-blown syntactic parse. You just
need a good idea of where the base
syntactic units are.
 Often referred to as chunks.
• For example, if you’re interested in
locating all the people, places and
organizations in a text it might be useful to
know where all the NPs are.
50
3/6/2019
Examples

• The first two are examples of full partial parsing or chunking.


All of the elements in the text are part of a chunk. And the
chunks are non-overlapping.
• Note how the second example has no hierarchical structure.
• The last example illustrates base-NP chunking. Ignore
anything that isn’t in the kind of chunk you’re looking for.

51
3/6/2019
Partial Parsing

• Two approaches
 Rule-based (hierarchical) transduction.
 Statistical sequence labeling
 HMMs
 MEMMs

52
3/6/2019
Rule-Based Partial Parsing

• Restrict the form of rules to exclude recursion


(make the rules flat).
• Group and order the rules so that the RHS of the
rules can refer to non-terminals introduced in
earlier transducers, but not later ones.
• Combine the rules in a group.
• Combine the groups into a cascade…
• Then compose, determinize and minimize the
whole thing (optional).

53
3/6/2019
Typical Architecture

• Phase 1: Part of speech tags


• Phase 2: Base syntactic phrases
• Phase 3: Larger verb and noun groups
• Phase 4: Sentential level rules

54
3/6/2019
Partial Parsing

• No direct or indirect
recursion allowed in
these rules.
• That is you can’t
directly or indirectly
reference the LHS of
the rule on the RHS.

55
3/6/2019
Cascaded Transducers

56
3/6/2019
Partial Parsing

• This cascaded approach can be used to


find the sequence of flat chunks you’re
interested in.
• Or it can be used to approximate the kind
of hierarchical trees you get from full
parsing with a CFG.

57
3/6/2019

Вам также может понравиться