MongoSV - Schema Design

Schema Design
Alvin Richards
alvin@10gen.com
Topics
Introduction
• Basic Data Modeling
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
So why model data?
http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
• MongoDB continues this separation

Relational made normalized
data look like this
Document databases make
normalized data look like this
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?
• Dynamic Queries • Read / Write Ratio

• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
Considerations
• No Joins
• Document writes are atomic
So today’s example will use...
Design Session
Design documents that simply map to
your application
post = {author: “Hergé”,
date: new Date(),
text: “Destination Moon”,
tags: [“comic”, “adventure”]}
> db.post.save(post)
Find the document
> db.posts.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)",
text: "Destination Moon",
tags: [ "comic", "adventure" ]
}
Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
// 1 means ascending, -‐1 means descending
> db.posts.ensureIndex({author: 1})
> db.posts.find({author: 'Hergé'})

date: "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)",
author: "Hergé",
... }
Verifying indexes exist
> db.system.indexes.find()
// Index on ID

{ name: "_id_",
ns: "test.posts",
key: { "_id" : 1 } }
// Index on author

ns: "test.posts",
key: { "author" : 1 },
name: "author_1" }
Examine the query plan
> db.blogs.find({author: 'Hergé'}).explain()
{
"cursor" : "BtreeCursor author_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 5,
"indexBounds" : {
"author" : [
[
"Hergé",
"Hergé"
]
]
}
}
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags

> db.posts.find({tags: {$exists: true}})
Query operators

Regular expressions:
// posts where author starts with h
> db.posts.find({author: /^h/i })
Query operators

Regular expressions:
// posts where author starts with h
> db.posts.find({author: /^h/i })
Counting:
// number of posts written by Hergé
> db.posts.find({author: “Hergé”}).count()
Extending the Schema

new_comment = {author: “Kyle”,
date: new Date(),
text: “great book”}
> db.posts.update(
{text: “Destination Moon” },
{ ‘$push’: {comments: new_comment},
‘$inc’: {comments_count: 1}})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date : "Sat Jul 24 2010 19:47:11 GMT-‐0700 (PDT)",
text : "Destination Moon",
tags : [ "comic", "adventure" ],

comments : [
{
author : "Kyle",
date : "Sat Jul 24 2010 20:51:03 GMT-‐0700 (PDT)",
text : "great book"
}
],
comments_count: 1
}

// create index on nested documents:
> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({comments.author:”Kyle”})
// find last 5 posts:

> db.posts.find().sort({date:-‐1}).limit(5)
// find last 5 posts:

> db.posts.find().sort({date:-‐1}).limit(5)
// most commented post:

> db.posts.find().sort({comments_count:-‐1}).limit(1)
When sorting, check if you need an index

Watch for full table scans
> db.blogs.find({text: 'Destination Moon'}).explain()

{
"cursor" : "BasicCursor",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"indexBounds" : {

}
}
Map Reduce
Map reduce : count tags
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}
reduceFunc = function (k, v) {

var total = 0;
for (var i = 0; i < v.length; i++) {
total += v[i].count; }
return {count:total};
}
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find()
{ _id : "comic", value : { count : 1 } }
{ _id : "adventure", value : { count : 1 } }

Group
• Equivalent to a Group By in SQL
• Specific the attributes to group the data
• Process the results in a Reduce function

Group - Count post by Author
cmd = { key: { "author":true },
initial: {count: 0},
reduce: function(obj, prev) {
prev.count++;
},
};
result = db.posts.group(cmd);
[
{
"author" : "Hergé",
"count" : 1
},
{
"author" : "Kyle",
"count" : 3
}
]
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Inheritance
Single Table Inheritance - RDBMS
shapes table
id type area radius d length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance -
MongoDB
> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1}
{ _id: "2", type: "square",area: 4, d: 2}
{ _id: "3", type: "rect", area: 10, length: 5, width: 2}
MongoDB
> db.shapes.find()
// find shapes where radius > 0

> db.shapes.find({radius: {$gt: 0}})
MongoDB
> db.shapes.find()
// find shapes where radius > 0

> db.shapes.find({radius: {$gt: 0}})
// create index
> db.shapes.ensureIndex({radius: 1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
One to Many
- some queries hard
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- some queries hard
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns

- Embedded tree
- Normalized
Many - Many
Example:
- Product can be in many categories

- Category can have many products
Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Destination Moon",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}

Many - Many
products:

categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "adventure",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:

categories:
name: "adventure",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product

> db.categories.find({product_ids: ObjectId
("4c4ca23933fb5941681b912e")})
Alternative
products:

categories:
name: "adventure"}
Alternative
products:

categories:
name: "adventure"}
// All products for a given category

> db.products.find({category_ids: ObjectId
("4c4ca25433fb5941681b912f")})

Alternative
products:

categories:
name: "adventure"}
// All products for a given category

> db.products.find({category_ids: ObjectId
("4c4ca25433fb5941681b912f")})
// All categories for a given product

product = db.products.find(_id : some_id)
> db.categories.find({_id : {$in : product.category_ids}})
Trees
Full Tree in Document
{ comments: [
{ author: “Kyle”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]
}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit

Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store all Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
Array of Ancestors
{ _id: "a" }
//find all descendants of b:

> db.tree2.find({ancestors: ‘b’})
//find all direct descendants of b:

> db.tree2.find({parent: ‘b’})
Array of Ancestors
{ _id: "a" }
//find all descendants of b:

> db.tree2.find({ancestors: ‘b’})
//find all direct descendants of b:

> db.tree2.find({parent: ‘b’})
//find all ancestors of f:

> ancestors = db.tree2.findOne({_id:’f’}).ancestors
> db.tree2.find({_id: { $in : ancestors})
Trees as Paths
Store hierarchy as a path expression
- Separate each node by a delimiter, e.g. “/”
- Use text search for find parts of a tree
{ comments: [
{ author: “Kyle”, text: “initial post”,
path: “/” },
{ author: “Jim”, text: “jim’s comment”,
path: “/jim” },
{ author: “Kyle”, text: “Kyle’s reply to Jim”,
path : “/jim/kyle”} ] }
// Find the conversations Jim was part of

> db.posts.find({path: /^jim/i})
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
{ inprogress: false,
priority: 1,
...
}
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
{ inprogress: false,
priority: 1,
...
}
// find highest priority job and mark as in-‐progress

job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -‐1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})
Remember me?
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Summary
Schema design is different in MongoDB
Basic data design principals stay the same
Focus on how the apps manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)

download at mongodb.org
We’re Hiring !
alvin@10gen.com
conferences, appearances, and meetups

http://www.10gen.com/events
Facebook | Twitter | LinkedIn

http://bit.ly/mongo> @mongodb http://linkd.in/joinmongo

MongoSV - Schema Design

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MongoSV - Schema Design

Загружено:

Авторское право:

Доступные форматы

Schema Design

• MongoDB continues this separation

• Dynamic Queries • Read / Write Ratio

// 1 means ascending, -­‐1 means descending

> db.posts.ensureIndex({author: 1})

> db.posts.find({author: 'Hergé'})

// Index on ID

// Index on author

// find posts with any tags

// find posts with any tags

// find posts with any tags

// find last 5 posts:

// find last 5 posts:

// most commented post:

When sorting, check if you need an index

> db.blogs.find({text: 'Destination Moon'}).explain()

reduceFunc = function (k, v) {

res = db.posts.mapReduce(mapFunc, reduceFunc)

• Equivalent to a Group By in SQL

• Specific the attributes to group the data

• Process the results in a Reduce function

// find shapes where radius > 0

// find shapes where radius > 0

- Embedded Array / Array Keys

- Embedded Array / Array Keys

- Product can be in many categories

//All categories for a given product

// All products for a given category

// All products for a given category

// All categories for a given product

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 4MB limit

//find all descendants of b:

//find all direct descendants of b:

//find all descendants of b:

//find all direct descendants of b:

//find all ancestors of f:

// Find the conversations Jim was part of

// find highest priority job and mark as in-­‐progress

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the apps manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

conferences, appearances, and meetups

Facebook | Twitter | LinkedIn

Вам также может понравиться

// 1 means ascending, -‐1 means descending

// find highest priority job and mark as in-‐progress