Академический Документы
Профессиональный Документы
Культура Документы
t1 t2 t3 t4 ... tD
u1 ...
* * * * *
u2 * * * * ... *
u3 * * * * ... *
A= u4 * * * * ... *
... ... ... ... ... ... ...
un * * * * ... *
T T
PD
• [AA ]1,2 = u1 u2 = j=1 u1,j u2,j
is the inner product, an important measure of vector similarity.
• An example. Ravichandran et. al. (ACL 2005) found the top similar nouns for
each of n = 655, 495 nouns, from a collection of D=70 million Web pages.
Brute-force O(n2 D) ≈ 1019 may take forever. They used random
projections.
• An approximate solution may help finding the exact solution more efficiently.
Example: Databases query optimization
Dimension Reduction Techniques, (MMDS) June 24, 2006 5
A 22,340,000,000
Knuth 5,530,000
Kalevala 1,330,000
• Comparisons.
Dimension Reduction Techniques, (MMDS) June 24, 2006 7
A ∈ Rn×D =⇒ Ã ∈ Rn×k
T
PD T
Pk D
(u1 u2 = j=1 u1,j u2,j ) ≈ (ũ1 ũ2 = j=1 ũ1,j ũ2,j ) × k
Sketching: Scan the data; compute specific summary statistics; repeat k times.
1 2 3 4 5 6 7 ... ... D
1 h1
2 h
2
3 h3
4
5
...
n hn
• Random Projections
• Broder’s min-wise sketches.
A new algorithm
A R = B
v1,i 0 m a
∼ N , 1 1 .
v2,i 0 k a m2
k
X
â = v1T v2 = v1,i v2,i , E(â) =a
i=1
k
dˆ = kv1 − v2 k2 = ˆ
X
(v1,i − v2,i )2 , E(d) =d
i=1
However
Marginal norms m1 = ku1 k2 , m2 = ku2 k2 can be computed exactly
BBT ≈ AAT , but at least we can make the diagonals exact (easily).
And off-diagonals can be improved (a little bit more work)
Dimension Reduction Techniques, (MMDS) June 24, 2006 12
v1,i 0 m a
∼ N , 1 1 .
v2,i 0 k a m2
T 1 2
â = v1 v2 , Var (â) = m1 m2 + a ,
k
3 2 2 2
T
a −a v1 v2 + a −m1 m2 + m1 kv2 k + m2 kv1 k − m1 m2 v1T v2 = 0,
2 2
1 m1 m2 − a 1 2
Var (âM LE ) = ≤ Var (â) = m1 m2 + a
k m 1 m 2 + a2 k
• Many applications (e.g., clustering, SVM kernels) only need the distances,
linear or nonlinear estimators do not really matter.
• The MLE is even better. A highly accurate approximation is proposed for the
distribution of the MLE, which does not have closed-from distribution.
Dimension Reduction Techniques, (MMDS) June 24, 2006 16
k
k X 2d
− + = 0,
d i=1 (v1,i − v2,i )2 + d2
k
π Y 1
dˆgm = cosk
|v1,i − v2,i | .
k
2k i=1
Dimension Reduction Techniques, (MMDS) June 24, 2006 17
k
π Y
dˆgm = cosk |v1,i − v2,i |1/k , k > 1
2k i=1
2
The π
4k ≈2.5
k implies that dˆgm is 80% efficient, as the MLE has variance in
terms of 2.0
k .
Dimension Reduction Techniques, (MMDS) June 24, 2006 18
• These bounds are not tight. (we have more tight bounds)
• Without the restriction < 1, the exponential bounds do not exist.
• We prefer the exponential bounds of the MLE.
Dimension Reduction Techniques, (MMDS) June 24, 2006 19
Magic: They match the first four moments of an inverse Gaussian distribution,
which has the same support as dˆ1 , [0, ∞].
Dimension Reduction Techniques, (MMDS) June 24, 2006 21
The moments
2d2 3d 2
E dˆ1 = d, Var dˆ1 = + 2
k k
3
3
ˆ ˆ 12d 1
E d 1 − E( d 1 ) = 2 +O
k k3
4 4
4
ˆ ˆ 12d 156d 1
E d 1 − E( d 1 ) = 2 + +O
k k3 k4
12d4 186d4
The exact (asymptotic) fourth moment of dˆ1 = 1
k2 + k3 +O k4
Dimension Reduction Techniques, (MMDS) June 24, 2006 22
The density
!
2
r
αd − 32 (y − d)
Pr(dˆ1 = y) = y exp − ,
2π 2yβ
A symmetric bound
2
α
Pr |dˆ1 − d| ≥ d ≤ 2 exp − , 0≤<1
2(1 + )
Dimension Reduction Techniques, (MMDS) June 24, 2006 23
0
10
k=10
−2 k=50 k=20
10
Tail probability
k=100
−4
10 k=400
−6
10
−8 k=200
10 Empirical
−10
IG
10
0 0.2 0.4 0.6 0.8 1
ε
Tail bound
2 2
α α
Pr |dˆ1 − d| > d ≤ exp − + exp − , 0 ≤ < 1.
2(1 + ) 2(1 − )
0
10
k=10
−2 k=20
10
Tail probability
k=50
−4
10
k=400 k=100
−6
10 k=200
−8
10 Empirical
−10
IG Bound
10
0 0.2 0.4 0.6 0.8 1
ε
The inverse Gaussian Chernoff bound is reliable at least for ξ/ν ≥ 10−10 .
Dimension Reduction Techniques, (MMDS) June 24, 2006 26
Harvard Dataset (PNAS 2001, thank Wing H. Wong): 176 specimen, 3 classes,
12600 genes.
Using Cauchy random projections and both nonlinear estimators, the dimension
can be reduced from 12600 to 100, with little loss in accuracy.
0.5 30
Average misclassifications
GM GM
Average absolute error
0.4 25 MLE
MLE
20
0.3
15
0.2
10
0.1 5
0 0
10 100 10 100
Sample size k Sample size k
So far so good...
Dimension Reduction Techniques, (MMDS) June 24, 2006 29
1 1
2 2
3 3
4 4
5 5
n n
Dimension Reduction Techniques, (MMDS) June 24, 2006 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
u1 0 3 0 2 0 1 0 0 1 2 1 0 1 0 2 0
u2 1 4 0 0 1 2 0 1 0 0 3 0 0 2 1 1
Postings P: Only store non-zeros, “ID (Value),” sorted ascending by the IDs.
P 1 : 2 (3) 4 (2) 6 (1) 9 (1) 10 (2) 11 (1) 13 (1) 15 (2)
P 2 : 1 (1) 2 (4) 5 (1) 6 (2) 8 (1) 11 (3) 14 (2) 15 (1) 16(1)
Obtain exactly the same samples as if directly sampled the first Ds columns.
For example, when estimating pairwise distances for all n data points, we will
n(n−1)
have 2 different values of Ds .
Sketch size ki can be small, but the effective sample Ds could be very large. The
more sparse, the better.
Dimension Reduction Techniques, (MMDS) June 24, 2006 33
• In 0/1 data, numbers of non-zeros (fi , document frequency) are known. The
MLE amounts to estimating two-way contingency tables with margin
constraints. The solution is a cubic equation.
max(f1 ,f2 )
Sparsity: f1 and f2 are numbers of non-zeros. Often D < 1%
PD 2 2
PD 2
PD
D j=1 u1,j u2,j > j=1 u1,j j=1 u22,j usually, in heavy-tailed data.
Variance ratio
0.6
f1 = 0.05D
0.4
0.4 f = 0.95D
1
0.2
0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a/f2 a/f2
1 1
f2/f1 = 0.8 f2/f1 = 1
0.8 0.8
Variance ratio
Variance ratio
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a/f2 a/f2
Var(CRS)
When both use margins, the ratio Var(RP)
is < 1 almost always, unless u1 and
u2 are almost identical.
1 1
Variance ratio
0.6 0.6
0.4 0.4
f1 = 0.05 D
0.2 0.2
f = 0.95 D
0 1 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a/f2 a/f2
1 3
f2/f1 = 0.8 2.5 f2/f1 = 1
0.8
Variance ratio
Variance ratio
2
0.6
1.5
0.4
1
0.2 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8
a/f2 a/f2
Dimension Reduction Techniques, (MMDS) June 24, 2006 37
n(n−1)
Data (Each has total 2 pairs of distances)
Evaluation metric
n(n−1)
Among the 2 pairs, the percentage for which CRS does better than random
projections. Want > 0.5
Results...
Dimension Reduction Techniques, (MMDS) June 24, 2006 39
1 1
Inner product
Percentage
0.9995 0.98
L1 distance
0.999 0.96
0.9985 0.94
10 20 30 40 50 10 20 30 40 50
Sample size k Sample size k
1
L2 distance 1 L2 distance (Margins)
0.8
0.9998
0.6
0.9996
0.4
0.9994
0.2
10 20 30 40 50 10 20 30 40 50
Sample size k Sample size k
1
1
Inner product
Percentage
0.995
0.95
0.99
L1 distance
0.985 0.9
10 20 30 10 20 30
Sample size k Sample size k
1
1
0.8 L2 distance
L2 distance (Margins)
0.6 0.995
0.4
0.2 0.99
10 20 30 10 20 30
Sample size k Sample size k
Dimension Reduction Techniques, (MMDS) June 24, 2006 41
COREL Image Data: CRS are still better than RP for inner product and l2
distance (using margins)
0.9 0.04
0.03
Percentage
0.85 L1 distance
Inner product 0.02
0.8
0.01
0.75 0
10 20 30 40 50 10 20 30 40 50
Sample size k Sample size k
0.1 0.9
L2 distance (Margins)
0.05 0.8
L2 distance
0 0.7
−0.05 0.6
−0.1 0.5
10 20 30 40 50 10 20 30 40 50
Sample size k Sample size k
Dimension Reduction Techniques, (MMDS) June 24, 2006 42
MSN Data (original): CRS do better than RP in inner product and l2 distance
(using margins)
1 0.5
Inner product L1 distance
0.95 0.4
Percentage
0.9 0.3
0.85 0.2
0 50 100 150 0 50 100 150
Sample size k Sample size k
0.01 1
L2 distance 0.9 L2 distance (Margins)
0.8
0.005
0.7
0.6
0 0.5
0 50 100 150 0 50 100 150
Sample size k Sample size k
Dimension Reduction Techniques, (MMDS) June 24, 2006 44
MSN Data (square root): After transformation (as in practice), CRS do better than
RP in inner product, l1 and l2 (using margins)
1 1
0.96 0.96
0.94 0.94
L1 distance
0.92 0.92
0 50 100 150 0 50 100 150
Sample size k Sample size k
0.45 1
L2 distance (Margins)
L2 distance
0.4 0.98
0.35 0.96
0.3 0.94
0 50 100 150 0 50 100 150
Sample size k Sample size k
Dimension Reduction Techniques, (MMDS) June 24, 2006 45
• Using a fixed sketch size, then the less freqent (but often more interesting)
items are emphasized.
Dimension Reduction Techniques, (MMDS) June 24, 2006 46
Conclusions
• Too much data (although never enough)
• Compact data representations
• Accurate approximation algorithms (estimators)
• Dimension Reduction Techniques (in addition to SVD)
• Random sampling
• Sketching (e.g., normal and Cauchy random projections)
• Conditional Random Sampling (sampling + sketching)
• Improve normal random projection (for l2 ) using margins by nonlinear MLE.
• Propose nonlinear estimators for Cauchy random projections for l1 .
• Conditional Random Sampling (CRS), for sparse data and 0/1 data
• Flexible (can adjust sample size according to sparsity)
• Good for estimating inner products
• Easy to take advantage of margins.
Dimension Reduction Techniques, (MMDS) June 24, 2006 47
References