Академический Документы
Профессиональный Документы
Культура Документы
o oo
DIMENSIONAL DATA IN
DISTRIBUTED HASH TABLES
fturg Mike Male
MIKE MALONE
INFRASTRUCTURE ENGINEER
mike@simplegeo.com
@mjmalone
Locality vs Distributedness
Reductionism vs Emergence
ATOMICITY
Either all of a transaction’s actions are visible to another transaction, or none are
CONSISTENCY
Application-specific constraints must be met for transaction to succeed
ISOLATION
Two concurrent transactions will not see one another’s transactions while “in flight”
DURABILITY
The updates made to the database in a committed transaction will be visible to
future transactions
CONSISTENCY
Every node in the system contains the same data (e.g., replicas are
never out of date)
AVAILABILITY
Every request to a non-failing node in the system returns a response
PARTITION TOLERANCE
System properties (consistency and/or availability) hold even when
the system is partitioned and data is lost
ack
aept ack
ni
UNAVAILAB!
aept
CSTT!
{
column family
“users”: { fffff 0
key
“alice”: {
“city”: [“St. Louis”, 1287040737182],
columns
“name”: [“Alice”, 1287080340940],
},
...
},
}
...
alice
bob s3b
3e8 STRANGE LOOP 2010
Monday, October 18, 2010
HASH TABLE
SUPPORTED QUERIES
EXACT MATCH
RANGE
PROXIMITY
ANYTHING THAT’S NOT
EXACT MATCH
EXACT MATCH
RANGE
On a single dimension
? PROXIMITY
2 x 2, 2
1 2
1 2
3 4
14
x
01101
e STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
{
“record-index”: {
key
<geohash>:<id>
“9yzgcjn0:moonrise hotel”: {
“”: [“”, 1287040737182],
},
...
},
“records”: {
“moonrise hotel”: {
“latitude”: [“38.6554420”, 1287040737182],
“longitude”: [“-90.2992910”, 1287040737182],
...
}
}
}
1 2
3 4
Gie 4 5
STRANGE LOOP 2010
Monday, October 18, 2010
SPATIAL DATA
STILL MULTIDIMENSIONAL
DIMENSIONALITY REDUCTION ISN’T PERFECT
Clients must
• Pre-process to compose multiple queries
• Post-process to filter and merge results
Degenerate cases can be bad, particularly for nearest-neighbor
queries
x
x
x
x
x
o o o x
o
o o
o
STRANGE LOOP 2010
Monday, October 18, 2010
THE WORLD
IS NOT BALANCED
1 2
SAN FRANCISCO
3 4
1 2
SAN FRANCISCO
3 4
1 2 I’m sad.
SAN FRANCISCO
3 4
SAN FRANCISCO
3 4
o o
o xo
o o
o xo
o o
o xo
EXACT MATCH
RANGE
PROXIMITY
SOMETHING ELSE I HAVEN’T
EVEN HEARD OF
MIKE MALONE
INFRASTRUCTURE ENGINEER
mike@simplegeo.com
@mjmalone