Академический Документы
Профессиональный Документы
Культура Документы
using Fingerprinting
Ingo Mller12 , Peter Sanders1 , Robert Schulze2 , and Wei Zhou12
Department of Informatics of KIT1 and HANA Core of SAP AG2
Sanders,
Schulze, Zhou:
KIT Mller,
University of
the State of Baden-Wuerttemberg
and
National
Laboratoryand
of the Perfect
Helmholtz Association
Retrieval
Hashing using
Fingerprinting
S V q t . . .
flag noun, verb,
flare n, v,
flask n,
flash n, v, . . .
Vq
optimization
Applications
Look-up in dictionaries of in-memory DBMSs (like the SAP HANA database [1])
Many more. . . (see Botelho et al. [2])
1
Known Methods
Perfect Hash Functions
Practical implementations exist: BPZ [3], CHD [4], etc.
Store only constant, sometimes optimal number of extra bits
Retrieval: use a PHF to index an array of values
Direct Retrieval Data Structures
CHM [5], etc.
Construction
Dynamic operations
Cache misses per query
Prior work
Our solution
complicated
inherently sequential
slow
no (rebuild)
2
simpler
easily parallelizable
faster
yes
1
Level 2
Level L
Level L 1
..
.
PHFbased
retrieval
data
structure
..
.
n
buckets
b
A bucket
..
.
fingerprints v1
v2
va
a cells
L levels
m buckets
fingerprints
PHF
fallback
a cells
n elements
m nb buckets
a cells per bucket
r-bit values
m buckets
PHF
fallback
fingerprints
n elements
m nb buckets
a cells per bucket
r-bit values
a cells
qsizefingerprintsq
r a a1
a1
bits
b
a1
35
30
25
20
15
10
5
0
1
k=149
k= 32
k= 64
k= 96
k=128
1.5
2
2.5
l -- cache misses
updates
queries
static part
S Vq t . . .
flag noun, verb,
flare n, v,
flask n,
flash n, v, . . .
dynamic part
Insertions
Needs a dynamic part with keys + some book-keeping information
Answer queries with the static part (FiRe)
Idea:
Overflow new and old element if fingerprints collide
Block fingerprint for future inserts
Rebuild when some stability criterion is violated
m buckets
fingerprints
PHF
fallback
a cells
Experimental Results
Settings
Experimental Results
Build Times
r 8 bits
build time [ns / element]
3000
2500
2000
1500
1000
500
0
.
-0
-3
99
.
-0
Z
BP
-2
PH
Fi
e5
e2
Fi
R
Fi
e5
Fi
Experimental Results
Space Overhead
r 8 bits
35
30
25
20
15
10
5
0
.
-0
.
-0
BP
-3
-2
PH
Fi
e5
Fi
e2
Fi
e5
Fi
99
Experimental Results
Query Times
r 8 bits
query time [ns / element]
400
350
300
250
200
150
100
50
0
C
BP
.
-0
.
-0
-3
-2
C
M
PH
Fi
e5
R
Fi
e2
Fi
e5
R
Fi
99
FiRe has the best query times due to low number of cache misses
FiPHa comparable to CHD-0.99, but has much faster construction
12
Thank you
14
References I
[1] F. Frber et al., SAP HANA Database: Data management for modern business applications, SIGMOD
Rec., vol. 40, no. 4, pp. 4551, 2012. [Online]. Available: http://doi.acm.org/10.1145/2094114.2094126
[2] F. C. Botelho and N. Ziviani, External perfect hashing for very large key sets, in Proceedings of the
sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007, pp.
653662.
[3] F. C. Botelho, R. Pagh, and N. Ziviani, Simple and space-efficient minimal perfect hash functions, in
Algorithms and Data Structures. Springer, 2007, pp. 139150.
[4] D. Belazzougui, F. C. Botelho, and M. Dietzfelbinger, Hash, displace, and compress, in Algorithms-ESA
2009. Springer, 2009, pp. 682693.
[5] Z. J. Czech, G. Havas, and B. S. Majewski, An optimal algorithm for generating minimal perfect hash
functions, Information Processing Letters, vol. 43, no. 5, pp. 257264, 1992.
[6] D. de Castro Reis, D. Belazzougui, F. C. Botelho, and N. Ziviani, CMPH C Minimal Perfect Hashing
Library. [Online]. Available: http://cmph.sourceforge.net
15