Академический Документы
Профессиональный Документы
Культура Документы
Alessandro Acquisti
(with Ralph Gross)
Heinz School and SCS
Carnegie Mellon University
One caveat
Agenda
Online social networks (OSN)
Social security numbers (SSN) and identity theft
OSN as Breeding documents: Estimating SSN from OSN
data
Approach
Data
Pattern discovery
Estimation
Results
Conclusions
Online social networks
What are online social networks?
representation
Details offline
Predictability of area numbers
Predictability of serial numbers
Plots of R2 of regressions of SN
R2 increases significantly over time, especially for less
populous states
Results
How we verified our estimates
Predicted SSN based on
Facebook data
SSDI
Contrasted predicted SSN to sample of actual
SSN (protected, IRB approved)
Benchmark
To date, more than 400 million SSNs have been
issued by the SSA
Currently, there are around 540 possible area
numbers, 99 possible group numbers, and 9,999
possible serial numbers
Combining them together, one has odds of 1
over 643,435,650 of guessing an individual’s
SSN without using any information about that
individual
0.000000155%
Results from our estimates
Exact predictions: we correctly estimated…
5.8% exact area number
2.8% exact group number and exact area number
I.e., for 2.8% of our sample we could correctly identify
the first 5 digits of the SSN
Versus ~0.001% random guess)
Results from our estimates
Range predictions: we correctly estimated…
60% right window of possible area numbers
18.4% exact group number and right window of area
numbers
22% in window of +/-10 group number and right
window of area numbers
3.3% in window of +/-10 group number and exact
area number
Results from our estimates
Serial numbers and complete SSN: we correctly
estimated
2.3% of serial numbers within +/-500 digits of exact serial
number and right window of area numbers and window of +/-10
group number
1% of serial number within +/-500 digits of exact SSN and exact
area numbers and exact group number
Closest match absolute difference=278, average=2,212, sd=2,255
In other words:
For 1% of our sample, we could estimate exactly the first 5
digits of the SSN and the last 4 digits within a +/- 500 digit
range
Compare to 0.000000155% if random (improved by 645,161
times)
Discussion
Some concluding remarks
The attack presented here is not unique to OSN, but
OSNs make it easy to attempt it
Attack purely based on public data