MongoDB Schema Design Basics

open-‐source,
high-‐performance,
document-‐oriented database
Schema Design
Basics
Alvin Richards
alvin@10gen.com

This talk

Part One Part Two
‣  Intro ‣  Data modeling

‣  Terms / Definitions ‣  DBRef
‣  Single Table
Inheritance
‣  Getting a flavor
‣  Many – Many
‣  Creating a Schema
‣  Trees
‣  Indexes
‣  Lists / Queues /
‣  Evolving the
Stacks
Schema
So why model data?
A brief history of normalization
•  1970 E.F.Codd introduces 1st Normal Form (1NF)

•  1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)

•  1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)

•  2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:

•  Avoid anomalies when inserting, updating or deleting

•  Minimize redesign when extending the schema

•  Make the model informative to users

•  Avoid bias towards a particular style of query

* source : wikipedia

Relational made normalized
data look like this
Document databases make
normalized data look like this
Some terms before we proceed
RDBMS
Document DBs

Table
Collection

Row(s)
JSON Document

Index
Index

Join
Embedding & Linking across
documents

Partition
Shard

Partition Key
Shard Key

DB Considerations

How can we manipulate Access Patterns ?
this data ?
•  Read / Write Ratio
•  Dynamic Queries
•  Types of updates
•  Secondary Indexes
•  Types of queries
•  Atomic Updates
•  Data life-cycle

•  Map Reduce
Considerations
•  No Joins
•  Document writes are atomic
Design Session
Design documents that simply map to

your application
post = {author: “kyle”,

date: new Date(),
text: “my blog post...”,
tags: [“mongodb”, “intro”]}
>db.post.save(post)
Find the document
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "My first blog",
tags : [ "mongodb", "intro" ] }
Notes:
•  ID must be unique, but can be anything you’d like
•  MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
// 1 means ascending, -1 means descending
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'kyle'})
author : "kyle",
... }
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags

>db.posts.find({tags: {$exists: true}})
Query operators

Regular expressions:
// posts where author starts with k
>db.posts.find({author: /^k*/i })
Query operators

Regular expressions:
// posts where author starts with k
>db.posts.find({author: /^k*/i })
Counting:
// posts written by mike
>db.posts.find({author: “mike”}).count()
Extending the Schema
new_comment = {author: “fred”,

date: new Date(),
text: “super duper”}
new_info = { ‘$push’: {comments: new_comment},

‘$inc’: {comments_count: 1}}
>db.posts.update({_id: “...” }, new_info)

author : "kyle",
text : "My first blog",
tags : [ "mongodb", "intro" ],
comments_count: 1,
comments : [
{
author : "Fred",
text : "Super Duper"
}
]}
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
// find last 5 posts:

>db.posts.find().sort({date:-1}).limit(5)
// find last 5 posts:

>db.posts.find().sort({date:-1}).limit(5)
// most commented post:

>db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index

Map Reduce
Aggregation and batch manipulation
Collection in, Collection out
Parallel in sharded environments

Map reduce
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}
reduceFunc = function (k, v) {

var total = 0;
for (var i = 0; i < v.length; i++) { total += v[i].count; }
return {count:total}; }
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find()
{ _id : "intro", value : { count : 1 } }
{ _id : "mongodb", value : { count : 1 } }
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Wordnik
9B records, 100M queries / week, 1.2TB
{
entry : {
header: { id: 0,
headword: "m",
sourceDictionary: "GCide",
textProns : [
{text: "(em)",
seq:0}
],
syllables: [
{id: 0,
text: "m"}
],

sourceDictionary: "1913 Webster",
headWord: "m",
id: 1,
definitions: : [
{text: "M, the thirteenth letter..."},
{text: "As a numeral, M stands for 1000"}]
}
}
}
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Observations:
- Using Rich Documents works well
- Simplify relations by embedding them
- Iterative development is easy with MongoDB
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0

>db.shapes.find({radius: {$gt: 0}})
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
One to Many
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
Many - Many
Example:
- Product can be in many categories

- Category can have many products
Products
Category

- product_id
- category_id

Prod_Categories

-  id

-  product_id

-  category_id

Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
Many - Many
products:
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:
categories:
name: "Indonesia",
ObjectId("4c4ca30433fb5941681b9130"),
//All categories for a given product

>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
Many - Many
products:
categories:
name: "Indonesia",
ObjectId("4c4ca30433fb5941681b9130"),
//All categories for a given product

>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
//All products for a given category

>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
Alternative
products:
categories:
name: "Indonesia"}
Alternative
products:
categories:
name: "Indonesia"}
// All products for a given category

Alternative
products:
categories:
name: "Indonesia"}
// All products for a given category

// All categories for a given product

product = db.products.find(_id : some_id)
>db.categories.find({_id : {$in : product.category_ids}})
Trees
Full Tree in Document
{ comments: [
{ author: “rpb”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]}
Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 4MB limit
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
Array of Ancestors
{ _id: "a" }
//find all descendants of b:

>db.tree2.find({ancestors: ‘b’})
Array of Ancestors
{ _id: "a" }
//find all descendants of b:

>db.tree2.find({ancestors: ‘b’})
//find all ancestors of f:

>ancestors = db.tree2.findOne({_id:’f’}).ancestors
>db.tree2.find({_id: { $in : ancestors})
findAndModify
Queue example
//Example: find highest priority job and mark
job = db.jobs.findAndModify({ 
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})
Cool Stuff
- Aggregation
- Capped collections
- GridFS
- Geo
Learn More
•  Kyle’s presentation + video:
http://www.slideshare.net/kbanker/mongodb-schema-design
http://www.blip.tv/file/3704083
•  Dwight’s presentation
http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-
merriman
•  Documentation
Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command
Aggregration: http://www.mongodb.org/display/DOCS/Aggregation
Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections
Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing
GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
Thank You :-)
Download MongoDB
and let us know what you think

@mongodb
http://www.mongodb.org
DBRef
DBRef
{$ref: collection, $id: id_value}
- Think URL
- YDSMV: your driver support may vary
Sample Schema:
nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]}
Dereferencing:
nr.forEach(function(r) {
printjson(db[r.$ref].findOne({_id: r.$id}));
}
BSON
Mongodb stores data in BSON internally
Lightweight, Traversable, Efficient encoding
Typed
boolean, integer, float, date, string, binary, array...

MongoDB Schema Design Basics

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MongoDB Schema Design Basics

Загружено:

Авторское право:

Доступные форматы

open-­‐source,

‣ Intro ‣ Data modeling

Design documents that simply map to

post = {author: “kyle”,

// 1 means ascending, -1 means descending

// find posts with any tags

// find posts with any tags

// find posts with any tags

new_comment = {author: “fred”,

new_info = { ‘$push’: {comments: new_comment},

>db.posts.update({_id: “...” }, new_info)

// find last 5 posts:

// find last 5 posts:

// most commented post:

When sorting, check if you need an index

Collection in, Collection out

Parallel in sharded environments

reduceFunc = function (k, v) {

res = db.posts.mapReduce(mapFunc, reduceFunc)

// find shapes where radius > 0

- Product can be in many categories

//All categories for a given product

//All categories for a given product

//All products for a given category

// All products for a given category

// All products for a given category

// All categories for a given product

Pros: Single Document, Performance, Intuitive

//find all descendants of b:

//find all descendants of b:

//find all ancestors of f:

//Example: find highest priority job and mark

and let us know what you think

Lightweight, Traversable, Efficient encoding

Вам также может понравиться

open-‐source,

‣  Intro ‣  Data modeling