Вы находитесь на странице: 1из 77

Alvin Richards

alvin@10gen.com
Topics

Overview
Document Design
Modeling the “real world”
Replication & Sharding
Developing with MongoDB
Deployment
Drinking from the fire hose
Part One
MongoDB Overview
MongoDB is the leading database
for cloud deployment

web 2.0 companies started out using this


but now:
- enterprises
- financial industries

3 Reason
- Performance
- Large number of readers / writers
- Large data volume
- Agility (ease of development)
NoSQL Really
Means:
non-­‐relational,  next-­‐generation  
operational  datastores  and  databases
RDBMS
(Oracle,  MySQL)

past : one-size-fits-all
RDBMS
(Oracle,  MySQL)

New Gen.
OLAP
(vertica,  aster,  
greenplum)

present : business intelligence and analytics is now its own segment.


RDBMS
(Oracle,  MySQL)

New Gen. Non-relational


OLAP Operational
(vertica,  aster,   Stores
greenplum) (“NoSQL”)

future
we claim nosql segment will be:
* large
* not fragmented
* ‘platformitize-able’
Philosophy:  maximize  features  -­‐  up  to  the  “knee”  in  the  curve,  then  stop

• memcached

scalability  &  performance


• key/value

• RDBMS

depth  of  functionality


no  joins
+ no  complex  transactions

Horizontally Scalable
Architectures
no  joins
+ no  complex  transactions

New Data Models


Improved ways to develop
Part Two
Data Modeling in MongoDB
So why model data?

http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query

* source : wikipedia
The real benefit of relational

• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design

• MongoDB continues this separation


Relational made normalized
data look like this
Document databases make
normalized data look like this
Terminology

RDBMS MongoDB
Table Collection
Row(s) JSON  Document
Index Index
Join Embedding  &  Linking
Partition Shard
Partition  Key Shard  Key
Terminology

RDBMS MongoDB
Table Collection
Row(s) JSON  Document
Index Index
Join Embedding  &  Linking
Partition Shard
Partition  Key Shard  Key
Create a document
Design documents that simply map to
your application
post  =  {author:  “Hergé”,
               date:  new  Date(),
               text:  “Destination  Moon”,
               tags:  [“comic”,  “adventure”]}

>db.post.save(post)
Add and index, find via Index
Secondary index for “author”

// 1 means ascending, -1 means descending

>db.posts.ensureIndex({author: 1})

>db.posts.find({author: 'Hergé'})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
... }
Explain a query plan
>  db.blogs.find({author:  'Hergé'}).explain()
{
  "cursor"  :  "BtreeCursor  author_1",
  "nscanned"  :  1,
  "nscannedObjects"  :  1,
  "n"  :  1,
  "millis"  :  5,
  "indexBounds"  :  {
    "author"  :  [
      [
        "Hergé",
        "Hergé"
      ]
    ]
  }
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})

Regular expressions:
//  posts  where  author  starts  with  h
>  db.posts.find({author:  /^h/i  })  
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})

Regular expressions:
//  posts  where  author  starts  with  h
>  db.posts.find({author:  /^h/i  })  

Counting:
//  number  of  posts  written  by  Hergé
>  db.posts.find({author:  “Hergé”}).count()
Part Three
Modeling the “real world”
Inheritance
Single Table Inheritance - RDBMS

shapes table
id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2
Single Table Inheritance

>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0


>db.shapes.find({radius: {$gt: 0}})

// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents

- Embedded tree
- Single document
- Natural
- Hard to query

- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys


- Embedded tree
- Normalized
Many - Many
Example:

- Product can be in many categories


- Category can have many products
Many - Many
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "Adventure",  
         product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),
                                       ObjectId("4c4ca30433fb5941681b9130"),
                                       ObjectId("4c4ca30433fb5941681b913a"]}

//All  categories  for  a  given  product


>db.categories.find({product_ids:  ObjectId
("4c4ca23933fb5941681b912e")})
Alternative
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "Adventure"}

//  All  products  for  a  given  category


>db.products.find({category_ids:  ObjectId
("4c4ca25433fb5941681b912f")})  

//  All  categories  for  a  given  product


product    =  db.products.find(_id  :  some_id)
>db.categories.find({_id  :  {$in  :  product.category_ids}})  
Modeling - Sumamry

• Ability to model rich data constructions


• Relationships (1-1, 1-M, M-M)
• Trees
• Queues, Stacks
• Simple to change your data design
• Quickly map your application needs to data needs
Part Three
Replication & Sharding
Scaling

• Data size only goes up


• Operations/sec only go up
• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher

What is scaling?
Well - hopefully for everyone here.
Read Scalability : Replication
read

ReplicaSet  1

Primary

Secondary

Secondary

write
Basics
• MongoDB replication is a bit like RDBMS replication
Asynchronous master/slave at its core
• Variations:
Master / slave
Replica Sets
Replica Sets
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a secondary
Replica Sets – Design Concepts

1. Write is durable once avilable on a majority of


members
2. Writes may be visible before a cluster wide
commit has been completed
3. On a failover, if data has not been replicated
from the primary, the data is dropped (see #1).
Replica Set: Establishing

Member 1
Member 3

Member 2
Replica Set: Electing primary

Member 1
Member 3

Member 2
PRIMARY
Replica Set: Failure of master
negotiate
Member 1 new
Member 3
master PRIMARY

Member 2
DOWN
Replica Set: Reconfiguring

Member 1
Member 3
PRIMARY

Member 2
DOWN
Replica Set: Member recovers

Member 1
Member 3
PRIMARY

Member 2
RECOVER-
ING
Replica Set: Active

Member 1
Member 3
PRIMARY

Member 2
Write Scalability: Sharding
read key  range   key  range   key  range  
0  ..  30 31  ..  60 61  ..  100

ReplicaSet  1 ReplicaSet  2 ReplicaSet  3

Primary Primary Primary

Secondary Secondary Secondary

Secondary Secondary Secondary

write
Sharding

• Scale horizontally for data size, index size, write and


consistent read scaling

• Distribute databases, collections or a objects in a


collection

• Auto-balancing, migrations, management happen


with no down time

• Replica Sets for inconsistent read scaling

for inconsistent read scaling


Sharding

• Choose how you partition data


• Can convert from single master to sharded system
with no downtime
• Same features as non-sharding single master
• Fully consistent
Range Based

• collection is broken into chunks by range


• chunks default to 200mb or 100,000 objects
Architecture
Shards

mongod mongod mongod ...


Config mongod mongod mongod
Servers

mongod

mongod

mongod mongos mongos ...

client
Writes

• Inserts : require shard key, routed


• Removes: routed and/or scattered
• Updates: routed or scattered
Queries

• By shard key: routed


• Sorted by shard key: routed in order
• By non shard key: scatter gather
• Sorted by non shard key: distributed merge sort
Part Four
Developing with MongoDB
Platform and Language support
MongoDB is Implemented in C++ for best performance

Platforms 32/64 bit


• Windows
• Linux, Mac OS-X, FreeBSD, Solaris
Language drivers for
• Java
• Ruby / Ruby-on-Rails
• C#
• C / C++
• Erlang
• Python, Perl, JavaScript
• others...
.. and much more ! ..

ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
MongoDB features

• Durability
• Replication
• Sharding
• Connection options
Durability
What failures do you need to recover from?
• Loss of a single database node?
• Loss of a group of nodes?
Durability - Master only

• Write acknowledged
when in memory on
master only
Durability - Master + Slaves
• Write acknowledged when
in memory on master +
slave

• Will survive failure of a


single node
Durability - Master + Slaves +
fsync
• Write acknowledged when in
memory on master + slaves

• Pick a “majority” of nodes


• fsync in batches (since it
blocking)
Setting default error checking
//  Do  not  check  or  report  errors  on  write
com.mongodb.WriteConcern.NONE;

//  Use  default  level  of  error  check.  Do  not  send


//  a  getLastError(),  but  raise  exction  on  error
com.mongodb.WriteConcern.NORMAL;

//  Send  getLastError()  after  each  write.  Raise  an


//  exception  on  error
com.mongodb.WriteConcern.STRICT;

//  Set  the  concern


db.setWriteConcern(concern);
Customized WriteConcern
//  Wait  for  three  servers  to  acknowledge  write
WriteConcern  concern  =  
     new  WriteConcern(3);

//  Wait  for  three  servers,  with  a  1000ms  timeout


WriteConcern  concern  =  
     new  WriteConcern(3,  1000);

//  Wait  for  3  server,  100ms  timeout  and  fsync  


//  data  to  disk
WriteConcern  concern  =  
     new  WriteConcern(3,  1000,  true);
         
//  Set  the  concern
db.setWriteConcern(concern);
Using Replication

slaveOk()
- driver to send read requests to Secondaries
- driver will always send writes to Primary

Can be set on
-­‐  DB.slaveOk()
-­‐  Collection.slaveOk()
-­‐  find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
Using sharding

Before sharding

coll.save(
 new  BasicDBObjectBuilder(“author”,  “Hergé”).
      append(“text”,  “Destination  Moon”).
      append(“date”,  new  Date());

Query  q  =  ds.find(Blog.class,  “author”,  “Hergé”);

After sharding

No  code  change  required!


Connection options

MongoOptions  mo  =  new  MongoOptions();

//  Restrict  number  of  connections


mo.connectionsPerHost  =  MAX_THREADS  +  5;

//  Auto  reconnection  on  connection  failure


mo.autoConnectRetry  =  true;
Part Five
Deploying MongoDB
Part Five
Deploying MongoDB

• Performance tuning
• Sizing
• O/S Tuning / File System layout
• Backup
Backup
• Typically backups are driven from a slave
• Eliminates impact to client / application traffic to master
Slave delay

• Protection against app faults


• Protection against administration mistakes
O/S Config

• RAM - lots of it
• Filesystem
• EXT4 / XFS
• Better file allocation & performance

• I/O
• More disk the better
• Consider RAID10 or other RAID configs
Monitoring

• Munin, Cacti, Nagios

Primary function:
• Measure stats over time
• Tells you what is going on with
your system
• Alerts when threshold reached
Remember me?
Summary

MongoDB makes building applications simple

You can focus on what the apps needs to do

MongoDB has built-in

• Horizontal scaling (reads and writes)


• Simplified schema evolution
• Simplified deployed and operation
• Best match for development tools and agile processes
download at mongodb.org

We’re Hiring !
alvin@10gen.com

conferences,  appearances,  and  meetups


http://www.10gen.com/events

Facebook                    |                  Twitter                  |                  LinkedIn


http://bit.ly/mongoK   @mongodb http://linkd.in/joinmongo

Вам также может понравиться