Академический Документы
Профессиональный Документы
Культура Документы
Brendan McAdams
10gen, Inc.
brendan@10gen.com
@rit
From This...
... To This
+ =
Cows, er...
Large Dataset
Primary Key as username
Major concerns
Can I read & write this data efficiently at different scale?
Can I run calculations on large portions of this data?
Large Dataset
Primary Key as username
x b v t d f z s
h e u c w a y g
MongoDB Sharding ( as well as HDFS ) breaks data into chunks (~64 mb)
x b v t d f z s
h e u c w a y g
Representing data as chunks allows many levels of scale across n data nodes
x b v t d f z s
h e u c w a y g
x c b z t f v y
a s u g e w h d
to: brendan
from: tyler
subject: Re: Ruby Support
to: mike
from: brendan
subject: Node Support
to: brendan
from: mike
subject: Re: Node Support
to: mike
from: tyler
subject: COBOL Support
to: tyler
from: mike
subject: Re: COBOL Support
(WTF?)
to: brendan
from: tyler key: brendan
subject: Re: Ruby Support value: {count: 1}
to: mike
from: brendan
subject: Node Support key: tyler
value: {count: 1}
map function
to: brendan emit(k, v)
from: mike
subject: Re: Node Support key: mike
value: {count: 1}
to: mike
from: tyler key: brendan
subject: COBOL Support value: {count: 1}
to: tyler
from: mike
subject: Re: COBOL Support key: mike
(WTF?) value: {count: 1}
key: brendan
Group like keys together, value: {count: 1}
distinct values
(Automatically done by M/R frameworks)
key: mike
value: {count: 1}
key: brendan
value: {count: 1}
key: mike
value: {count: 1}
result
key: tyler key: mike
values: [{count: 1}, value: {count: 2}
{count: 1}]
key: brendan
key: tyler
values: [{count: 1},
value: {count: 2}
{count: 1}]
Google moved forward from Batch into more Realtime over last
few years
QUESTIONS?
*Contact Me*
brendan@10gen.com
(twitter: @rit)