Вы находитесь на странице: 1из 28

Low-Rank Regularization for Sparse

Conjunctive Feature Spaces:


An Application to Named Entity Classification

A. Primadhanty1 X. Carreras2 A. Quattoni2

1 2
Universitat Politècnica de Catalunya Xerox Research Centre Europe

ACL-IJNLP 2015 1 / 28
Challenge

Conjunction of sparse elementary features



very sparse

Example: Named Entity Classification

A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf


φl (l) φe (e) φr (r )
↓ ↓ ↓
sparse sparse sparse

ACL-IJNLP 2015 2 / 28
Approaches

`1 or `2
unseen conjunctions?

ACL-IJNLP 2015 3 / 28
Contribution

Low-rank regularization for sparse conjunctive feature spaces


Propagate weight to unseen conjunctions

Learning algorithm
Convex relaxation of the low-rank minimization function

Experiments
Improvement over `1 & `2

ACL-IJNLP 2015 4 / 28
Task

Given:
x = hl, e, r i

Goal:
Classify x into one entity class y in the set Y

A shipload of 12 tonnes of rice arrives in [Umm Qasr port] in the Gulf


l e r

y?

ACL-IJNLP 2015 5 / 28
Classifier

Log-Linear Model

exp{sθ (x, y)}


Pr(y | x; θ) = P 0
y 0 exp{sθ (x, y )}

sθ : X × Y → R is scoring function of entity tuples with a candidate class


θ are parameters of this function

ACL-IJNLP 2015 6 / 28
Scoring Function

Feature-based linear model

sθ (x, y) = φ(x) · wy

φ : X → {0, 1}n is a feature function representing entity tuples in an n-dimensional


binary feature space
θ = {wy }y∈Y are weight vector for each class

ACL-IJNLP 2015 7 / 28
Scoring Function

Left-right context model

sθ (hl, e, r i, y) = φl (l)> Wy φr (r )

φl ∈ Rd1 is a feature function representing left contexts


φr ∈ Rd2 is a feature function representing right contexts
Wy ∈ Rd1 ×d2 is weight matrix for each class, such that θ = {Wy }y∈Y

ACL-IJNLP 2015 8 / 28
Low Rank Parameter Matrices
SVD

···
u11 u1k

. . .
"σ1 ··· 0
# v11 ··· ···

v1d2
. . . 
 . . .  . .. . . . . . 
Wy =  . . .  .
. . .
.
 .
.
.
.
.
.
.
.
. . .

. . . 0 ··· σk vk 1 ··· ··· vkd2
ud1 ··· ud1 k | {z }| {z }
| {z } Σy V>
y
Uy

Consider that Wy has rank k


Uy ∈ Rd1 ×k and Vy ∈ Rd2 ×k are orthonormal projections
Σy ∈ Rk ×k is a diagonal matrix of singular values

ACL-IJNLP 2015 9 / 28
Score Function - Rewritten
Left-right context model

sθ (hl, e, r i, y) = φl (l)> Wy φr (r )

 
 
 u11 ··· u1k
  

 
 v ··· ··· r1
 . . .  v1d 
" #
σ1 ··· 0

 . 11 2 
. . 
 . . .  . .   .

. . . 
 .
[l1 ··· ]
ld 
1 


 . .


.   ..
.
.. .  
.  .
. .
.
.
.
. 
. 
.
.
| {z }  . . .   r
 . . .  0 ··· σk vk 1 ··· ··· vkd  d2
φl (l)> 
 ud ··· ud k | {z }| {z
2 
} | {z }
| 1 1
 
Σy
{z } V>
y
 φr (r )
Uy
| {z }
SVD(Wy )

ACL-IJNLP 2015 10 / 28
Score Function - Rewritten

Left-right context model

sθ (hl, e, r i, y) = φl (l)> Wy φr (r )

 u ··· u1k

11
. . .

1 ··· 0
# "v11 ··· ··· v1d #" r1 #!
. . . 2
. . . .
   . . . . . .
[l1 ··· ld ] . .
1 
 . . . . . . .
. . . . . . . . . . .
. . .
 
. . . 0 ··· σk vk 1 ··· ··· vkd rd
2 2
ud ··· ud k | {z }| {z }
| {z1 1
} Σy V>
y φr (r )
φl (l)> U y

Rank k → intrinsic dimensionality of the inner product behind the score function

ACL-IJNLP 2015 11 / 28
Adding Entity Features

One parameter matrix per feature tag and class label, i.e. θ = {Wt,y }t∈T ,y∈Y

X
sθ (hl, e, r i, y ) = φl (l)> Wt,y φr (r )
t∈φe (e)


Parameters: tensor

Rank defined by matricization

ACL-IJNLP 2015 12 / 28
Learning The Parameters

Objective Function

argmin L(W) + τ R(W)


W

L(W) is a convex loss function (negative log-likelihood)


R (W) is a regularizer
τ is a constant that trades off error and capacity

Minimizing rank → non-convex function



nuclear norm: convex relaxation
(Srebro & Shraibman, 2005)

ACL-IJNLP 2015 13 / 28
Experimental Settings

Task Named Entity Classification


Data Annotated English CoNLL
Training Minimal supervision (seeds) + large unlabeled data

ACL-IJNLP 2015 14 / 28
Class 10-30 Seed

PER clinton, dole, arafat, yeltsin, wasim akram, lebed, dutroux, waqar
younis, mushtaq ahmed, croft
LOC u.s., england, germany, britain, australia, france, spain, pakistan,
italy, china
ORG reuters, u.n., oakland, puk, osce, cincinnati, eu, nato, ajax, honda
MISC russian, german, british, french, dutch, english, israeli, european,
iraqi, australian
O year, percent, thursday, government, police, results, tuesday,
soccer, president, monday, friday, people, minister, sunday, divi-
sion, week, time, state, market, years, officials, group, company,
saturday, match, at, world, home, august, standings

For each entity class, the seed of entities for the 10-30 set.
Experimental Settings

Task Named Entity Classification


Data Annotated English CoNLL
Training Minimal supervision (seeds) + large unlabeled data
Evaluation Mentions of unseen entities

ACL-IJNLP 2015 16 / 28
CoNLL 2003 English Corpus

train dev test


2% 1% 2%

98% 99% 98%

All entities Ambiguous entities

Most entities in each set are non-ambiguous.

*Entities : unique candidate entities

ACL-IJNLP 2015 17 / 28
CoNLL 2003 English Corpus
train
13341

dev
5270

3303

ambiguous
177 out of 3303 (3.54%)

Almost all seen entities that appear in dev can be directly classified
as the same class.

*Entities : unique candidate entities

ACL-IJNLP 2015 18 / 28
Results on dev set

61.13
60 57.7

45.16
41.14
AVG-F1 (%)

39.84
37.1 44.67
29.67 39.74
27.07
30.21
27.05 L1
20
L2
NN
0
10 40 640 All
Seed set

AVG-F1 on dev set using different seed set for training, comparing `1 , `2 and nuclear-norm (NN) regularizer.
Feature set: elementary features and all conjunctions of entity tags and left-right contexts (cluster & PoS), window
size = 1
Seed set: number of examples per entity class (and 3× of non-entity examples)

ACL-IJNLP 2015 19 / 28
Results on dev set
60.94 61.13
60 60 56.16 60 57.7
L1
L2 44.2 45.16
42.81 41 41.14
40.45 39.84
AVG-F1 (%)

AVG-F1 (%)

AVG-F1 (%)
40 NN 38.3 36.87 45.04 37.1 44.67
39.52 39.74
28.25 28.54 28.48 29.67
25.11 27.18 25.91 27.07
28.88 28.57 L1 30.21 L1
17.72 27.41 25.3 27.05
14.12 20
L2 L2
17.58
14.23 NN NN
0 0 0
10 40 640 All 10 40 640 All 10 40 640 All
Seed set Seed set Seed set
Only full conjunctions of left-right contexts Elementary features and all conjunctions of Elementary features and all conjunctions of
(cluster), window size = 1 entity tags and left-right contexts (cluster), entity tags and left-right contexts (cluster &
window size = 1 PoS), window size = 1

59.46 58.43
60 60 56.74 60 54.56
53.65

42.72 44.09 44.1


40.54 39.34
AVG-F1 (%)

AVG-F1 (%)

AVG-F1 (%)
38.95 37.45 38.22 37.11
40 35.58 44.03 42.83
33.67 32.76 39.95
28.9 38.01 28.92 28.65 37.21
27.62
32.73 24.62
20.72 L1 28.33 L1 28.21 28.04 L1
17.4 24.41
20.05 L2 L2 L2
17.39
NN NN NN
0 0 0
10 40 640 All 10 40 640 All 10 40 640 All
Seed set Seed set Seed set
Only full conjunctions of entity tags and left- Elementary features and all conjunctions of Elementary features and all conjunctions of
right contexts (cluster), window size = 1 entity tags and left-right contexts (cluster), entity tags and left-right contexts (cluster &
window size = 2 PoS), window size = 2

ACL-IJNLP 2015 20 / 28
Results on test set

81
74 74
F1 (%)

55 56
51 54
46 47 48
40
34 36
29 31

PER LOC ORG MISC AVG

`1 `2 nuclear-norm

F1 performance on test set using “all” seed set for training, with best setting (based on results on dev) for each
regularizers.

ACL-IJNLP 2015 21 / 28
Model Dimensions

61.73 61.13
60 54.24
50.19 58.14
AVG-F1 (%)

40 43.93

28.22
20

0
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70
dimension

Avg. F1 on development for increasing dimensions, using the best low-rank model in development set trained with
all seeds.

ACL-IJNLP 2015 22 / 28
Cluster

PoS tags

Feature conjunctions in dev set

 conjunctions in dev
Cluster

PoS tags

Feature conjunctions in dev set


 conjunctions in dev that are unseen in train (with 10 seeds)
 conjunctions in dev that are seen in train (with 10 seeds)
Cluster

PoS tags

Feature conjunctions in dev set


 conjunctions in dev that are unseen in train (with 10 seeds) and has zero weight
 conjunctions in dev that are seen in train (with 10 seeds)
 conjunctions in dev that are unseen in train but assigned non-zero weight by model trained on 10 seeds
Cluster

PoS tags

Feature conjunctions in dev set


 conjunctions in dev that are unseen in train (with 10 seeds) and has zero weight
 conjunctions in dev that are seen in train (with 10 seeds)
 conjunctions in dev that are unseen in train but assigned non-zero weight by model trained on 10 seeds
Conclusion

Low-rank regularization framework for sparse conjunctive


feature spaces
Tensors
Nuclear-norm

Experimented on learning entity classifiers


Compare to `1 and `2 penalties → better results
Illustrated weight propagation to unseen conjunctions

Future works : explore different tensor transformations

ACL-IJNLP 2015 27 / 28
Thank you!

ACL-IJNLP 2015 28 / 28

Вам также может понравиться