Академический Документы
Профессиональный Документы
Культура Документы
Alvin Richards
alvin@10gen.com
Topics
Introduction
• Basic Data Modeling
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
So why model data?
http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Partition Shard
Partition
Key Shard
Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?
>
db.post.save(post)
Find the document
>
db.posts.find()
{
_id:
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author:
"Hergé",
date:
"Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)",
text:
"Destination
Moon",
tags:
[
"comic",
"adventure"
]
}
Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
Regular expressions:
//
posts
where
author
starts
with
h
>
db.posts.find({author:
/^h/i
})
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
//
posts
where
author
starts
with
h
>
db.posts.find({author:
/^h/i
})
Counting:
//
number
of
posts
written
by
Hergé
>
db.posts.find({author:
“Hergé”}).count()
Extending the Schema
new_comment
=
{author:
“Kyle”,
date:
new
Date(),
text:
“great
book”}
>
db.posts.update(
{text:
“Destination
Moon”
},
{
‘$push’:
{comments:
new_comment},
‘$inc’:
{comments_count:
1}})
Extending the Schema
{
_id
:
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author
:
"Hergé",
date
:
"Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)",
text
:
"Destination
Moon",
tags
:
[
"comic",
"adventure"
],
comments
:
[
{
author
:
"Kyle",
date
:
"Sat
Jul
24
2010
20:51:03
GMT-‐0700
(PDT)",
text
:
"great
book"
}
],
comments_count:
1
}
Extending the Schema
//
create
index
on
nested
documents:
>
db.posts.ensureIndex({"comments.author":
1})
>
db.posts.find({comments.author:”Kyle”})
Extending the Schema
//
create
index
on
nested
documents:
>
db.posts.ensureIndex({"comments.author":
1})
> db.posts.find({comments.author:”Kyle”})
> db.posts.find({comments.author:”Kyle”})
>db[res.result].find()
{
_id
:
"comic",
value
:
{
count
:
1
}
}
{
_id
:
"adventure",
value
:
{
count
:
1
}
}
Group
[
{
"author"
:
"Hergé",
"count"
:
1
},
{
"author"
:
"Kyle",
"count"
:
3
}
]
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Inheritance
Single Table Inheritance - RDBMS
shapes table
id type area radius d length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance -
MongoDB
>
db.shapes.find()
{
_id:
"1",
type:
"circle",area:
3.14,
radius:
1}
{
_id:
"2",
type:
"square",area:
4,
d:
2}
{
_id:
"3",
type:
"rect",
area:
10,
length:
5,
width:
2}
Single Table Inheritance -
MongoDB
>
db.shapes.find()
{
_id:
"1",
type:
"circle",area:
3.14,
radius:
1}
{
_id:
"2",
type:
"square",area:
4,
d:
2}
{
_id:
"3",
type:
"rect",
area:
10,
length:
5,
width:
2}
//
create
index
>
db.shapes.ensureIndex({radius:
1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns
{
comments:
[
{
author:
“Kyle”,
text:
“...”,
replies:
[
{author:
“Fred”,
text:
“...”,
replies:
[]}
]}
]
}
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store all Ancestors of a node
{
_id:
"a"
}
{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
Array of Ancestors
- Store all Ancestors of a node
{
_id:
"a"
}
{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
{
comments:
[
{
author:
“Kyle”,
text:
“initial
post”,
path:
“/”
},
{
author:
“Jim”,
text:
“jim’s
comment”,
path:
“/jim”
},
{
author:
“Kyle”,
text:
“Kyle’s
reply
to
Jim”,
path
:
“/jim/kyle”}
]
}
http://devilseve.blogspot.com/2010/06/like-drinking-from-fire-hose.html
Summary
We’re Hiring !
alvin@10gen.com