Вы находитесь на странице: 1из 58

Schema Design

Alvin Richards
alvin@10gen.com
Topics

Introduction
• Basic Data Modeling
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
So why model data?

http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query

* source : wikipedia
The real benefit of relational

• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design

• MongoDB continues this separation


Relational made normalized
data look like this
Document databases make
normalized data look like this
Terminology

RDBMS MongoDB
Table Collection
Row(s) JSON  Document
Index Index
Join Embedding  &  Linking
Partition Shard
Partition  Key Shard  Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?

• Dynamic Queries • Read / Write Ratio


• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
Considerations
• No Joins
• Document writes are atomic
So today’s example will use...
Design Session
Design documents that simply map to
your application
post  =  {author:  “Hergé”,
               date:  new  Date(),
               text:  “Destination  Moon”,
               tags:  [“comic”,  “adventure”]}

>  db.post.save(post)
Find the document
>  db.posts.find()

   {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),
       author:  "Hergé",  
       date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",  
       text:  "Destination  Moon",  
       tags:  [  "comic",  "adventure"  ]
   }    

Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”

 //      1  means  ascending,  -­‐1  means  descending

 >  db.posts.ensureIndex({author:  1})

 >  db.posts.find({author:  'Hergé'})  


 
     {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),
         date:  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",
         author:  "Hergé",  
         ...  }
Verifying indexes exist
>  db.system.indexes.find()

//  Index  on  ID


   {  name:  "_id_",  
       ns:  "test.posts",  
       key:  {  "_id"  :  1  }  }

//  Index  on  author


   {  _id:  ObjectId("4c4ba6c5672c685e5e8aabf4"),  
       ns:  "test.posts",  
       key:  {  "author"  :  1  },  
       name:  "author_1"  }
Examine the query plan
>  db.blogs.find({author:  'Hergé'}).explain()
{
  "cursor"  :  "BtreeCursor  author_1",
  "nscanned"  :  1,
  "nscannedObjects"  :  1,
  "n"  :  1,
  "millis"  :  5,
  "indexBounds"  :  {
    "author"  :  [
      [
        "Hergé",
        "Hergé"
      ]
    ]
  }
}
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})

Regular expressions:
//  posts  where  author  starts  with  h
>  db.posts.find({author:  /^h/i  })  
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

//  find  posts  with  any  tags


>  db.posts.find({tags:  {$exists:  true}})

Regular expressions:
//  posts  where  author  starts  with  h
>  db.posts.find({author:  /^h/i  })  

Counting:
//  number  of  posts  written  by  Hergé
>  db.posts.find({author:  “Hergé”}).count()
Extending the Schema
     
 new_comment  =  {author:  “Kyle”,  
                               date:  new  Date(),
                               text:  “great  book”}

 >  db.posts.update(
                     {text:  “Destination  Moon”  },  
                     {  ‘$push’:  {comments:  new_comment},
                         ‘$inc’:    {comments_count:  1}})
Extending the Schema
 
   {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),  
       author  :  "Hergé",
       date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",  
       text  :  "Destination  Moon",
       tags  :  [  "comic",  "adventure"  ],
       
       comments  :  [
  {
    author  :  "Kyle",
    date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",
    text  :  "great  book"
  }
       ],
       comments_count:  1
   }
   
Extending the Schema
//  create  index  on  nested  documents:
>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})
Extending the Schema
//  create  index  on  nested  documents:
>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})

//  find  last  5  posts:


>  db.posts.find().sort({date:-­‐1}).limit(5)
Extending the Schema
//  create  index  on  nested  documents:
>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({comments.author:”Kyle”})

//  find  last  5  posts:


>  db.posts.find().sort({date:-­‐1}).limit(5)

//  most  commented  post:


>  db.posts.find().sort({comments_count:-­‐1}).limit(1)

When sorting, check if you need an index


Watch for full table scans

>  db.blogs.find({text:  'Destination  Moon'}).explain()    


{
  "cursor"  :  "BasicCursor",
  "nscanned"  :  1,
  "nscannedObjects"  :  1,
  "n"  :  1,
  "millis"  :  0,
  "indexBounds"  :  {
   
  }
}
Map Reduce
Map reduce : count tags
mapFunc  =  function  ()  {
       this.tags.forEach(function  (z)  {emit(z,  {count:1});});
}

reduceFunc  =  function  (k,  v)  {


       var  total  =  0;
       for  (var  i  =  0;  i  <  v.length;  i++)  {    
           total  +=  v[i].count;  }
       return  {count:total};  
}

res  =  db.posts.mapReduce(mapFunc,  reduceFunc)

>db[res.result].find()
         {  _id  :  "comic",  value  :  {  count  :  1  }  }
         {  _id  :  "adventure",  value  :  {  count  :  1  }  }

   
Group

• Equivalent to a Group By in SQL

• Specific the attributes to group the data

• Process the results in a Reduce function


Group - Count post by Author
cmd  =  {  key:  {  "author":true  },
               initial:  {count:  0},
               reduce:  function(obj,  prev)  {
                               prev.count++;
                           },
           };
result  =  db.posts.group(cmd);

[
  {
    "author"  :  "Hergé",
    "count"  :  1
  },
  {
    "author"  :  "Kyle",
    "count"  :  3
  }
]
Review

So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Inheritance
Single Table Inheritance - RDBMS

shapes table
id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2
Single Table Inheritance -
MongoDB
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}
Single Table Inheritance -
MongoDB
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  


>  db.shapes.find({radius:  {$gt:  0}})
Single Table Inheritance -
MongoDB
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  


>  db.shapes.find({radius:  {$gt:  0}})

//  create  index
>  db.shapes.ensureIndex({radius:  1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents

- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents

- Embedded tree
- Single document
- Natural
- Hard to query

- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys


- Embedded tree
- Normalized
Many - Many
Example:

- Product can be in many categories


- Category can have many products
Many - Many
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
Many - Many
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "adventure",  
         product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),
                                       ObjectId("4c4ca30433fb5941681b9130"),
                                       ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "adventure",  
         product_ids:  [  ObjectId("4c4ca23933fb5941681b912e"),
                                       ObjectId("4c4ca30433fb5941681b9130"),
                                       ObjectId("4c4ca30433fb5941681b913a"]}

//All  categories  for  a  given  product


>  db.categories.find({product_ids:  ObjectId
("4c4ca23933fb5941681b912e")})
Alternative
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "adventure"}
Alternative
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "adventure"}

//  All  products  for  a  given  category


>  db.products.find({category_ids:  ObjectId
("4c4ca25433fb5941681b912f")})  
 
Alternative
products:
     {  _id:  ObjectId("4c4ca23933fb5941681b912e"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("4c4ca25433fb5941681b912f"),
                                         ObjectId("4c4ca25433fb5941681b92af”]}
   
categories:
     {  _id:  ObjectId("4c4ca25433fb5941681b912f"),  
         name:  "adventure"}

//  All  products  for  a  given  category


>  db.products.find({category_ids:  ObjectId
("4c4ca25433fb5941681b912f")})  

//  All  categories  for  a  given  product


product    =  db.products.find(_id  :  some_id)
>  db.categories.find({_id  :  {$in  :  product.category_ids}})  
Trees
Full Tree in Document

{  comments:  [
         {  author:  “Kyle”,  text:  “...”,  
             replies:  [
                                           {author:  “Fred”,  text:  “...”,
                                             replies:  []}  
             ]}
   ]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 4MB limit

   
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent

Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }
Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:


>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:


>  db.tree2.find({parent:  ‘b’})
Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:


>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:


>  db.tree2.find({parent:  ‘b’})

//find  all  ancestors  of  f:


>  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors
>  db.tree2.find({_id:  {  $in  :  ancestors})
Trees as Paths
Store hierarchy as a path expression
- Separate each node by a delimiter, e.g. “/”
- Use text search for find parts of a tree

{  comments:  [
         {  author:  “Kyle”,  text:  “initial  post”,  
             path:  “/”  },
         {  author:  “Jim”,    text:  “jim’s  comment”,
             path:  “/jim”  },
         {  author:  “Kyle”,  text:  “Kyle’s  reply  to  Jim”,
             path  :  “/jim/kyle”}  ]  }

//  Find  the  conversations  Jim  was  part  of  


>  db.posts.find({path:  /^jim/i})
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
     {  inprogress:  false,
         priority:  1,  
     ...
     }
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
     {  inprogress:  false,
         priority:  1,  
     ...
     }

//  find  highest  priority  job  and  mark  as  in-­‐progress


job  =  db.jobs.findAndModify({
                             query:    {inprogress:  false},
                             sort:      {priority:  -­‐1),  
                             update:  {$set:  {inprogress:  true,  
                                                             started:  new  Date()}},
                             new:  true})    
Remember me?

http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the apps manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)


download at mongodb.org

We’re Hiring !
alvin@10gen.com

conferences,  appearances,  and  meetups


http://www.10gen.com/events

Facebook                    |                  Twitter                  |                  LinkedIn


http://bit.ly/mongo>   @mongodb http://linkd.in/joinmongo

Вам также может понравиться