Академический Документы
Профессиональный Документы
Культура Документы
Lecture 3
Morphology and Finite State Machines
Sharon Goldwater
(based on slides by Philipp Koehn)
22 September 2016
Sharon Goldwater
ANLP Lecture 3
22 September 2016
Lecture recording
I am testing out recording equipment today.
If all goes well, this and following lectures will be recorded.
Audio and slides only in this room, Video in Tue room.
I will try to remember to repeat questions so they are recorded.
Sharon Goldwater
ANLP Lecture 3
Recap: Tasks
Recognition
given: surface form
wanted: yes/no decision if it is in the language
Generation
given: lemma and morphological properties
wanted: surface form
Analysis
given: surface form
wanted: lemma and morphological properties
Sharon Goldwater
ANLP Lecture 3
Sharon Goldwater
ANLP Lecture 3
START
c
c
END
Sharon Goldwater
ANLP Lecture 3
One Word
walk
Sharon Goldwater
ANLP Lecture 3
walk
+ed
Sharon Goldwater
ANLP Lecture 3
walk
+ed
+ing
Sharon Goldwater
ANLP Lecture 3
laugh
+s
walk
+ed
report
+ing
Multiple stems
implements regular verb morphology
laughs, laughed, laughing
walks, walked, walking
reports, reported, reporting
Sharon Goldwater
ANLP Lecture 3
laugh
+s
walk
+ed
report
+ing
Multiple stems
implements regular verb morphology
what about bake, bite?
more on this later...
Sharon Goldwater
ANLP Lecture 3
Derivational Morphology
ion
word
START
double lines = end state
Sharon Goldwater
ANLP Lecture 3
10
Derivational Morphology
ion
word
START
Sharon Goldwater
ANLP Lecture 3
11
Derivational Morphology
ion
word
fy
START
again: wordify not wordyfy!
again, will come back to that later...
Sharon Goldwater
ANLP Lecture 3
12
Derivational Morphology
ion
word
fy
cate
START
why a loop? could it be placed differently?
Sharon Goldwater
ANLP Lecture 3
13
Derivational Morphology
ion
word
START
Sharon Goldwater
ism
fy
ist
cate
er
ANLP Lecture 3
14
y
N
START
ism
fy
A
ist
er
Sharon Goldwater
cate
ANLP Lecture 3
15
y
N
START
ism
fy
cate
A
ist
er
ANLP Lecture 3
16
Concatenation
Constructing an FSA gets very complicated
Build components as separate FSAs
L: FSA for lexicon
D: FSA for derivational morphology
I: FSA for inflectional morphology
Concatenate L + D + I (there are standard algorithms)
In fact, each component may consist of multiple components
(e.g., D has different sets of affixes with ordering constraints)
Sharon Goldwater
ANLP Lecture 3
17
What is Required?
Lexicon of lemmas
very large, needs to be collected by hand
Inflection and derivation rules
not large, but requires understanding of the language
Recent work: automatically learn lemmas and suffixes from a corpus
OK solution for languages with few resources
Hand-engineered systems much better when available
Sharon Goldwater
ANLP Lecture 3
18
Sharon Goldwater
ANLP Lecture 3
19
Sharon Goldwater
laugh
walk
ed
report
ing
ANLP Lecture 3
20
Schematically
s
verbreg
ed
ing
Sharon Goldwater
ANLP Lecture 3
21
+V:
+past:ed
+prog:ing
Sharon Goldwater
ANLP Lecture 3
22
Sharon Goldwater
ANLP Lecture 3
23
Sharon Goldwater
ANLP Lecture 3
24
+V:^
+past:ed#
+prog:ing#
ANLP Lecture 3
25
e:
e,other
^:
ed#:ed
other
where other means any character except e.
Examples: walked# walked, bakeed# baked
A nondeterministic FST: mulitple transitions may be possible
on the same input (where?).
If any path goes to end state, string is accepted.
Sharon Goldwater
ANLP Lecture 3
26
cats#
ANLP Lecture 3
foxs#
axles#
27
Sharon Goldwater
ANLP Lecture 3
usually in a
28
Carmel Toolkit
http://www.isi.edu/licensed-sw/carmel/
FSA Toolkit
http://www-i6.informatik.rwth-aachen.de/kanthak/fsa.html
Sharon Goldwater
ANLP Lecture 3
29
Sharon Goldwater
ANLP Lecture 3
30
A L L
A L L
deletion
A B L
substitution
A B L E insertion
Sharon Goldwater
ANLP Lecture 3
31
Example alignments
Let ins/del cost (distance) = 1, sub cost = 2 (0 if no change)
(can use other costs, incl diff costs for diff chars)
Two optimal alignments (cost = 4):
S T A L L d | | s | i
- T A B L E
Sharon Goldwater
ANLP Lecture 3
S T A - L L
d | | i | s
- T A B L E
32
Example alignments
Let ins/del cost (distance) = 1, sub cost = 2 (0 if no change)
(can use other costs, incl diff costs for diff chars)
Two optimal alignments (cost = 4):
S T A L L d | | s | i
- T A B L E
S T A - L L
d | | i | s
- T A B L E
ANLP Lecture 3
S T A L - L d d s s i | i
- - T A B L E
33
Sharon Goldwater
ANLP Lecture 3
34
ANLP Lecture 3
35
Intuition
Minimum distance D(stall, table) must be the minimum of:
D(stall, tabl) + 1 (ins)
D(stal, table) + 1 (del)
D(stal, tabl) + 2 (sub)
Similarly for the smaller subproblems
So proceed as follows:
solve smallest subproblems first
store solutions in a table (chart)
use these to solve and store larger subproblems until we get the
full solution
Sharon Goldwater
ANLP Lecture 3
36
Sharon Goldwater
ANLP Lecture 3
37
ANLP Lecture 3
38
ANLP Lecture 3
39
Sharon Goldwater
ANLP Lecture 3
40
Chart: completed
0
S 1
T 2
A 3
L 4
L 5
Sharon Goldwater
T A B L E
1 2 3 4 5
2 3 4 5 6
1 2 3 4 5
2 1 2 3 4
3 2 3 2 3
4 3 4 3 4
ANLP Lecture 3
41
To find alignments
0
S 1
T 2
A 3
L 4
L 5
T
A
B
L
E
1
2
3
4
5
-2 -3 -4 -5 -6
-1
2
3
4
5
2
-1
2
3
4
3
2
-3
-2
3
4
3
-4 -3 -4
ANLP Lecture 3
42
Announcements
Next lecture: Probability models and estimation.
Assumes you know or are getting to grips with the material in
the maths tutorials on the Readings section.
Start working on the tutorial exercise sheets (see web page):
bring questions to groups on Tue.
Questions about reading? Confused by lecture? Post to Piazza!
Remember to sign up using link on course web page.
Tutorials start on Tuesday. I will announce on Monday (and on
web page, Piazza) temporary group assignments for week 2.
Sharon Goldwater
ANLP Lecture 3
43