Acl15slides Handout

Low-Rank Regularization for Sparse
Conjunctive Feature Spaces:

An Application to Named Entity Classification
A. Primadhanty1 X. Carreras2 A. Quattoni2
1 2
Universitat Politècnica de Catalunya Xerox Research Centre Europe
ACL-IJNLP 2015 1 / 28
Challenge
Conjunction of sparse elementary features

↓
very sparse
Example: Named Entity Classification
A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf

φl (l) φe (e) φr (r )
↓ ↓ ↓
sparse sparse sparse
ACL-IJNLP 2015 2 / 28
Approaches
`1 or `2
unseen conjunctions?
ACL-IJNLP 2015 3 / 28
Contribution
Low-rank regularization for sparse conjunctive feature spaces

Propagate weight to unseen conjunctions
Learning algorithm
Convex relaxation of the low-rank minimization function
Experiments
Improvement over `1 & `2
ACL-IJNLP 2015 4 / 28
Task
Given:
x = hl, e, r i
Goal:
Classify x into one entity class y in the set Y
A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf

l e r
↓
y?
ACL-IJNLP 2015 5 / 28
Classifier
Log-Linear Model
exp{sθ (x, y)}

Pr(y | x; θ) = P 0
y 0 exp{sθ (x, y )}
sθ : X × Y → R is scoring function of entity tuples with a candidate class

θ are parameters of this function
ACL-IJNLP 2015 6 / 28
Scoring Function
Feature-based linear model
sθ (x, y) = φ(x) · wy
φ : X → {0, 1}n is a feature function representing entity tuples in an n-dimensional

binary feature space
θ = {wy }y∈Y are weight vector for each class
ACL-IJNLP 2015 7 / 28
Scoring Function
Left-right context model
sθ (hl, e, r i, y) = φl (l)> Wy φr (r )
φl ∈ Rd1 is a feature function representing left contexts

φr ∈ Rd2 is a feature function representing right contexts
Wy ∈ Rd1 ×d2 is weight matrix for each class, such that θ = {Wy }y∈Y
ACL-IJNLP 2015 8 / 28
Low Rank Parameter Matrices
SVD
···
u11 u1k

. . .
"σ1 ··· 0
# v11 ··· ···

v1d2
. . . 
 . . .  . .. . . . . . 
Wy =  . . .  .
. . .
.
 .
.
.
.
.
.
.
.
. . .

. . . 0 ··· σk vk 1 ··· ··· vkd2
ud1 ··· ud1 k | {z }| {z }
| {z } Σy V>
y
Uy
Consider that Wy has rank k

Uy ∈ Rd1 ×k and Vy ∈ Rd2 ×k are orthonormal projections
Σy ∈ Rk ×k is a diagonal matrix of singular values
ACL-IJNLP 2015 9 / 28
Score Function - Rewritten
 
 
 u11 ··· u1k
  

 
 v ··· ··· r1
 . . .  v1d 
" #
σ1 ··· 0

 . 11 2 
. . 
 . . .  . .   .

. . . 
 .
[l1 ··· ]
ld 
1 


 . .


.   ..
.
.. .  
.  .
. .
.
.
.
. 
. 
.
.
| {z }  . . .   r
 . . .  0 ··· σk vk 1 ··· ··· vkd  d2
φl (l)> 
 ud ··· ud k | {z }| {z
2 
} | {z }
| 1 1
 
Σy
{z } V>
y
 φr (r )
Uy
| {z }
SVD(Wy )
ACL-IJNLP 2015 10 / 28
Score Function - Rewritten
 u ··· u1k

11
. . .
"σ
1 ··· 0
# "v11 ··· ··· v1d #" r1 #!
. . . 2
. . . .
   . . . . . .
[l1 ··· ld ] . .
1 
 . . . . . . .
. . . . . . . . . . .
. . .
 
. . . 0 ··· σk vk 1 ··· ··· vkd rd
2 2
ud ··· ud k | {z }| {z }
| {z1 1
} Σy V>
y φr (r )
φl (l)> U y
Rank k → intrinsic dimensionality of the inner product behind the score function
ACL-IJNLP 2015 11 / 28
Adding Entity Features
One parameter matrix per feature tag and class label, i.e. θ = {Wt,y }t∈T ,y∈Y
X
sθ (hl, e, r i, y ) = φl (l)> Wt,y φr (r )
t∈φe (e)
↓
Parameters: tensor
↓
Rank defined by matricization
ACL-IJNLP 2015 12 / 28
Learning The Parameters
Objective Function
argmin L(W) + τ R(W)

W
L(W) is a convex loss function (negative log-likelihood)

R (W) is a regularizer
τ is a constant that trades off error and capacity
Minimizing rank → non-convex function

↓
nuclear norm: convex relaxation
(Srebro & Shraibman, 2005)
ACL-IJNLP 2015 13 / 28
Experimental Settings
Task Named Entity Classification

Data Annotated English CoNLL
Training Minimal supervision (seeds) + large unlabeled data
ACL-IJNLP 2015 14 / 28
Class 10-30 Seed
PER clinton, dole, arafat, yeltsin, wasim akram, lebed, dutroux, waqar
younis, mushtaq ahmed, croft
LOC u.s., england, germany, britain, australia, france, spain, pakistan,
italy, china
ORG reuters, u.n., oakland, puk, osce, cincinnati, eu, nato, ajax, honda
MISC russian, german, british, french, dutch, english, israeli, european,
iraqi, australian
O year, percent, thursday, government, police, results, tuesday,
soccer, president, monday, friday, people, minister, sunday, divi-
sion, week, time, state, market, years, officials, group, company,
saturday, match, at, world, home, august, standings
For each entity class, the seed of entities for the 10-30 set.
Experimental Settings
Task Named Entity Classification

Data Annotated English CoNLL
Training Minimal supervision (seeds) + large unlabeled data
Evaluation Mentions of unseen entities
ACL-IJNLP 2015 16 / 28
CoNLL 2003 English Corpus
train dev test

2% 1% 2%
98% 99% 98%
All entities Ambiguous entities
Most entities in each set are non-ambiguous.
*Entities : unique candidate entities
ACL-IJNLP 2015 17 / 28
CoNLL 2003 English Corpus
train
13341
dev
5270
3303
ambiguous
177 out of 3303 (3.54%)
Almost all seen entities that appear in dev can be directly classified
as the same class.
*Entities : unique candidate entities
ACL-IJNLP 2015 18 / 28
Results on dev set
61.13
60 57.7
45.16
41.14
AVG-F1 (%)
39.84
37.1 44.67
29.67 39.74
27.07
30.21
27.05 L1
20
L2
NN
0
10 40 640 All
Seed set
AVG-F1 on dev set using different seed set for training, comparing `1 , `2 and nuclear-norm (NN) regularizer.
Feature set: elementary features and all conjunctions of entity tags and left-right contexts (cluster & PoS), window
size = 1
Seed set: number of examples per entity class (and 3× of non-entity examples)
ACL-IJNLP 2015 19 / 28
Results on dev set
60.94 61.13
60 60 56.16 60 57.7
L1
L2 44.2 45.16
42.81 41 41.14
40.45 39.84
AVG-F1 (%)
AVG-F1 (%)
AVG-F1 (%)
40 NN 38.3 36.87 45.04 37.1 44.67
39.52 39.74
28.25 28.54 28.48 29.67
25.11 27.18 25.91 27.07
28.88 28.57 L1 30.21 L1
17.72 27.41 25.3 27.05
14.12 20
L2 L2
17.58
14.23 NN NN
0 0 0
10 40 640 All 10 40 640 All 10 40 640 All
Seed set Seed set Seed set
Only full conjunctions of left-right contexts Elementary features and all conjunctions of Elementary features and all conjunctions of
(cluster), window size = 1 entity tags and left-right contexts (cluster), entity tags and left-right contexts (cluster &
window size = 1 PoS), window size = 1
59.46 58.43
60 60 56.74 60 54.56
53.65
42.72 44.09 44.1

40.54 39.34
AVG-F1 (%)
AVG-F1 (%)
AVG-F1 (%)
38.95 37.45 38.22 37.11
40 35.58 44.03 42.83
33.67 32.76 39.95
28.9 38.01 28.92 28.65 37.21
27.62
32.73 24.62
20.72 L1 28.33 L1 28.21 28.04 L1
17.4 24.41
20.05 L2 L2 L2
17.39
NN NN NN
0 0 0
10 40 640 All 10 40 640 All 10 40 640 All
Seed set Seed set Seed set
Only full conjunctions of entity tags and left- Elementary features and all conjunctions of Elementary features and all conjunctions of
right contexts (cluster), window size = 1 entity tags and left-right contexts (cluster), entity tags and left-right contexts (cluster &
window size = 2 PoS), window size = 2
ACL-IJNLP 2015 20 / 28
Results on test set
81
74 74
F1 (%)
55 56
51 54
46 47 48
40
34 36
29 31
PER LOC ORG MISC AVG
`1 `2 nuclear-norm
F1 performance on test set using “all” seed set for training, with best setting (based on results on dev) for each
regularizers.
ACL-IJNLP 2015 21 / 28
Model Dimensions
61.73 61.13
60 54.24
50.19 58.14
AVG-F1 (%)
40 43.93
28.22
20
0
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70
dimension
Avg. F1 on development for increasing dimensions, using the best low-rank model in development set trained with
all seeds.
ACL-IJNLP 2015 22 / 28
Cluster
PoS tags
Feature conjunctions in dev set
conjunctions in dev
Cluster
PoS tags

conjunctions in dev that are unseen in train (with 10 seeds)
conjunctions in dev that are seen in train (with 10 seeds)
Cluster
PoS tags

conjunctions in dev that are unseen in train (with 10 seeds) and has zero weight
conjunctions in dev that are unseen in train but assigned non-zero weight by model trained on 10 seeds
Cluster
PoS tags

conjunctions in dev that are unseen in train (with 10 seeds) and has zero weight
conjunctions in dev that are unseen in train but assigned non-zero weight by model trained on 10 seeds
Conclusion
Low-rank regularization framework for sparse conjunctive

feature spaces
Tensors
Nuclear-norm
Experimented on learning entity classifiers

Compare to `1 and `2 penalties → better results
Illustrated weight propagation to unseen conjunctions
Future works : explore different tensor transformations
ACL-IJNLP 2015 27 / 28
Thank you!
ACL-IJNLP 2015 28 / 28

Acl15slides Handout

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Acl15slides Handout

Загружено:

Авторское право:

Доступные форматы

Low-Rank Regularization for Sparse

Conjunctive Feature Spaces:

A. Primadhanty1 X. Carreras2 A. Quattoni2

Conjunction of sparse elementary features

Example: Named Entity Classification

A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf

Low-rank regularization for sparse conjunctive feature spaces

A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf

exp{sθ (x, y)}

sθ : X × Y → R is scoring function of entity tuples with a candidate class

Feature-based linear model

φ : X → {0, 1}n is a feature function representing entity tuples in an n-dimensional

Left-right context model

φl ∈ Rd1 is a feature function representing left contexts

Consider that Wy has rank k

Left-right context model

argmin L(W) + τ R(W)

L(W) is a convex loss function (negative log-likelihood)

Minimizing rank → non-convex function

Task Named Entity Classification

Task Named Entity Classification

train dev test

98% 99% 98%

All entities Ambiguous entities

Most entities in each set are non-ambiguous.

*Entities : unique candidate entities

*Entities : unique candidate entities

42.72 44.09 44.1

PER LOC ORG MISC AVG

Feature conjunctions in dev set

Feature conjunctions in dev set

Feature conjunctions in dev set

Feature conjunctions in dev set

Low-rank regularization framework for sparse conjunctive

Experimented on learning entity classifiers

Future works : explore different tensor transformations

Вам также может понравиться