Cloudstock 2010

Alvin Richards
alvin@10gen.com
Topics
Overview
Document Design
Modeling the “real world”
Replication & Sharding
Developing with MongoDB
Deployment
Drinking from the fire hose
Part One
MongoDB Overview
MongoDB is the leading database
for cloud deployment
web 2.0 companies started out using this

but now:
- enterprises
- financial industries
3 Reason
- Performance
- Large number of readers / writers
- Large data volume
- Agility (ease of development)
NoSQL Really
Means:
non-‐relational, next-‐generation
operational datastores and databases
RDBMS
(Oracle, MySQL)
past : one-size-fits-all
RDBMS
(Oracle, MySQL)
New Gen.
OLAP
(vertica, aster,
greenplum)
present : business intelligence and analytics is now its own segment.

RDBMS
(Oracle, MySQL)
New Gen. Non-relational

OLAP Operational
(vertica, aster, Stores
greenplum) (“NoSQL”)
future
we claim nosql segment will be:
* large
* not fragmented
* ‘platformitize-able’
Philosophy: maximize features -‐ up to the “knee” in the curve, then stop
• memcached
scalability & performance

• key/value
• RDBMS
depth of functionality

no joins
+ no complex transactions
Horizontally Scalable
Architectures
no joins
+ no complex transactions
New Data Models

Improved ways to develop
Part Two
Data Modeling in MongoDB
So why model data?
http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
• MongoDB continues this separation

Relational made normalized
data look like this
Document databases make
normalized data look like this
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
Create a document
Design documents that simply map to
your application
post = {author: “Hergé”,
date: new Date(),
text: “Destination Moon”,
tags: [“comic”, “adventure”]}
>db.post.save(post)
Add and index, find via Index
Secondary index for “author”
// 1 means ascending, -1 means descending
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'Hergé'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
... }
Explain a query plan
> db.blogs.find({author: 'Hergé'}).explain()
{
"cursor" : "BtreeCursor author_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 5,
"indexBounds" : {
"author" : [
[
"Hergé",
"Hergé"
]
]
}
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags

> db.posts.find({tags: {$exists: true}})
Query operators

Regular expressions:
// posts where author starts with h
> db.posts.find({author: /^h/i })
Query operators

Regular expressions:
// posts where author starts with h
> db.posts.find({author: /^h/i })
Counting:
// number of posts written by Hergé
> db.posts.find({author: “Hergé”}).count()
Part Three
Modeling the “real world”
Inheritance
Single Table Inheritance - RDBMS
shapes table
id type area radius d length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0

>db.shapes.find({radius: {$gt: 0}})
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns

- Embedded tree
- Normalized
Many - Many
Example:
- Product can be in many categories

- Category can have many products
Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Destination Moon",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}

categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Adventure",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product

>db.categories.find({product_ids: ObjectId
("4c4ca23933fb5941681b912e")})
Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Destination Moon",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}

categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Adventure"}
// All products for a given category

>db.products.find({category_ids: ObjectId
("4c4ca25433fb5941681b912f")})
// All categories for a given product

product = db.products.find(_id : some_id)
>db.categories.find({_id : {$in : product.category_ids}})
Modeling - Sumamry
• Ability to model rich data constructions

• Relationships (1-1, 1-M, M-M)
• Trees
• Queues, Stacks
• Simple to change your data design
• Quickly map your application needs to data needs
Part Three
Replication & Sharding
Scaling
• Data size only goes up

• Operations/sec only go up
• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
What is scaling?
Well - hopefully for everyone here.
Read Scalability : Replication
read
ReplicaSet 1
Primary
Secondary
Secondary
write
Basics
• MongoDB replication is a bit like RDBMS replication
Asynchronous master/slave at its core
• Variations:
Master / slave
Replica Sets
Replica Sets
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a secondary
Replica Sets – Design Concepts
1. Write is durable once avilable on a majority of

members
2. Writes may be visible before a cluster wide
commit has been completed
3. On a failover, if data has not been replicated
from the primary, the data is dropped (see #1).
Replica Set: Establishing
Member 1
Member 3
Member 2
Replica Set: Electing primary
Member 1
Member 3
Member 2
PRIMARY
Replica Set: Failure of master
negotiate
Member 1 new
Member 3
master PRIMARY
Member 2
DOWN
Replica Set: Reconfiguring
Member 1
Member 3
PRIMARY
Member 2
DOWN
Replica Set: Member recovers
Member 1
Member 3
PRIMARY
Member 2
RECOVER-
ING
Replica Set: Active
Member 1
Member 3
PRIMARY
Member 2
Write Scalability: Sharding
read key range key range key range
0 .. 30 31 .. 60 61 .. 100
ReplicaSet 1 ReplicaSet 2 ReplicaSet 3
Primary Primary Primary
Secondary Secondary Secondary
Secondary Secondary Secondary
write
Sharding
• Scale horizontally for data size, index size, write and

consistent read scaling
• Distribute databases, collections or a objects in a

collection
• Auto-balancing, migrations, management happen

with no down time
• Replica Sets for inconsistent read scaling
for inconsistent read scaling

Sharding
• Choose how you partition data

• Can convert from single master to sharded system
with no downtime
• Same features as non-sharding single master
• Fully consistent
Range Based
• collection is broken into chunks by range

• chunks default to 200mb or 100,000 objects
Architecture
Shards
mongod mongod mongod ...

Conﬁg mongod mongod mongod
Servers
mongod
mongod
mongod mongos mongos ...
client
Writes
• Inserts : require shard key, routed

• Removes: routed and/or scattered
• Updates: routed or scattered
Queries
• By shard key: routed

• Sorted by shard key: routed in order
• By non shard key: scatter gather
• Sorted by non shard key: distributed merge sort
Part Four
Developing with MongoDB
Platform and Language support
MongoDB is Implemented in C++ for best performance
Platforms 32/64 bit

• Windows
• Linux, Mac OS-X, FreeBSD, Solaris
Language drivers for
• Java
• Ruby / Ruby-on-Rails
• C#
• C / C++
• Erlang
• Python, Perl, JavaScript
• others...
.. and much more ! ..
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
MongoDB features
• Durability
• Replication
• Sharding
• Connection options
Durability
What failures do you need to recover from?
• Loss of a single database node?
• Loss of a group of nodes?
Durability - Master only
• Write acknowledged
when in memory on
master only
Durability - Master + Slaves
• Write acknowledged when
in memory on master +
slave
• Will survive failure of a

single node
Durability - Master + Slaves +
fsync
• Write acknowledged when in
memory on master + slaves
• Pick a “majority” of nodes

• fsync in batches (since it
blocking)
Setting default error checking
// Do not check or report errors on write
com.mongodb.WriteConcern.NONE;
// Use default level of error check. Do not send

// a getLastError(), but raise exction on error
com.mongodb.WriteConcern.NORMAL;
// Send getLastError() after each write. Raise an

// exception on error
com.mongodb.WriteConcern.STRICT;
// Set the concern

db.setWriteConcern(concern);
Customized WriteConcern
// Wait for three servers to acknowledge write
WriteConcern concern =
new WriteConcern(3);
// Wait for three servers, with a 1000ms timeout

new WriteConcern(3, 1000);
// Wait for 3 server, 100ms timeout and fsync

// data to disk
new WriteConcern(3, 1000, true);

// Set the concern
db.setWriteConcern(concern);
Using Replication
slaveOk()
- driver to send read requests to Secondaries
- driver will always send writes to Primary
Can be set on
-‐ DB.slaveOk()
-‐ Collection.slaveOk()
-‐ find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
Using sharding
Before sharding
coll.save(
new BasicDBObjectBuilder(“author”, “Hergé”).
append(“text”, “Destination Moon”).
append(“date”, new Date());
Query q = ds.find(Blog.class, “author”, “Hergé”);
After sharding
No code change required!

Connection options
MongoOptions mo = new MongoOptions();
// Restrict number of connections

mo.connectionsPerHost = MAX_THREADS + 5;
// Auto reconnection on connection failure

mo.autoConnectRetry = true;
Part Five
Deploying MongoDB
Part Five
Deploying MongoDB
• Performance tuning
• Sizing
• O/S Tuning / File System layout
• Backup
Backup
• Typically backups are driven from a slave
• Eliminates impact to client / application traffic to master
Slave delay
• Protection against app faults

• Protection against administration mistakes
O/S Config
• RAM - lots of it
• Filesystem
• EXT4 / XFS
• Better file allocation & performance
• I/O
• More disk the better
• Consider RAID10 or other RAID configs
Monitoring
• Munin, Cacti, Nagios
Primary function:
• Measure stats over time
• Tells you what is going on with
your system
• Alerts when threshold reached
Remember me?
Summary
MongoDB makes building applications simple
You can focus on what the apps needs to do
MongoDB has built-in
• Horizontal scaling (reads and writes)

• Simplified schema evolution
• Simplified deployed and operation
• Best match for development tools and agile processes
download at mongodb.org
We’re Hiring !
alvin@10gen.com
conferences, appearances, and meetups

http://www.10gen.com/events
Facebook | Twitter | LinkedIn

http://bit.ly/mongoK @mongodb http://linkd.in/joinmongo

Cloudstock 2010

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cloudstock 2010

Загружено:

Авторское право:

Доступные форматы

Alvin Richards

web 2.0 companies started out using this

present : business intelligence and analytics is now its own segment.

New Gen. Non-relational

scalability & performance

depth of functionality

New Data Models

• MongoDB continues this separation

// 1 means ascending, -1 means descending

// find posts with any tags

// find posts with any tags

// find posts with any tags

// find shapes where radius > 0

- Embedded Array / Array Keys

- Embedded Array / Array Keys

- Product can be in many categories

//All categories for a given product

// All products for a given category

// All categories for a given product

• Ability to model rich data constructions

• Data size only goes up

1. Write is durable once avilable on a majority of

ReplicaSet 1 ReplicaSet 2 ReplicaSet 3

Primary Primary Primary

Secondary Secondary Secondary

Secondary Secondary Secondary

• Scale horizontally for data size, index size, write and

• Distribute databases, collections or a objects in a

• Auto-balancing, migrations, management happen

• Replica Sets for inconsistent read scaling

for inconsistent read scaling

• Choose how you partition data

• collection is broken into chunks by range

mongod mongod mongod ...

mongod mongos mongos ...

• Inserts : require shard key, routed

• By shard key: routed

Platforms 32/64 bit

• Will survive failure of a

• Pick a “majority” of nodes

// Use default level of error check. Do not send

// Send getLastError() after each write. Raise an

// Set the concern

// Wait for three servers, with a 1000ms timeout

// Wait for 3 server, 100ms timeout and fsync

Query q = ds.find(Blog.class, “author”, “Hergé”);

No code change required!

MongoOptions mo = new MongoOptions();

// Restrict number of connections

// Auto reconnection on connection failure

• Protection against app faults

• Munin, Cacti, Nagios

MongoDB makes building applications simple

You can focus on what the apps needs to do

MongoDB has built-in

• Horizontal scaling (reads and writes)

conferences, appearances, and meetups

Facebook | Twitter | LinkedIn

Вам также может понравиться