Вы находитесь на странице: 1из 81

NoSQL Introduction

NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.

2
NoSQL Solves Some Problems
• Identify Problems first…
• Don’t implement Solutions just
because they are Awesome

3
NoSQL
• History
• Popular NoSQL Databases
• NoSQL Implementation CRUD Operations

4
NoSQL History
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)

5
NoSQL History

6 http://www.w3resource.com/mongodb/nosql.php
NoSQL Origination
• Problems not solved by RDBMs
• Limitations of RDBMs, not SQL

7
NoSQL History

8
NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, newer databases solving
new and different problems
• Not Only SQL

9
Most Popular Databases
http://db-engines.com/en/ranking
Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes

10
Most Popular NoSQL
• MongoDB - Document Store
• Cassandra – Wide Column Store
• Redis – Key-value store
• Solr – Search Engine
• Hbase – Wide Column Store
• Neo4j – Graph Database
• Memcached – Key-value Store
• CouchDB – Document Store
• Riak – Key-value Store
• SimpleDB – Key-value Store within Amazon Cloud

11
Download NoSQL v95.141.3

Released 5/17/2017
http://www.nosql.org/downloads/laeRtoN.zip

12
Reading Recommendations

Great Overview of NoSQL:


Seven Databases in Seven Weeks
Eric Redmond and Jim Wilson

13
2016 NoSQL vs RDBMs

14 Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/


NoSQL Database Types
• Document (JSON)
– MongoDB
• Column Oriented Databases (Columnar)
– Cassandra
– Hbase
• Graph
– Neo4j

15
NoSQL Database Types
• Key-Value – Redis, Riak

• Search Database - Solr


• Key-Value Web Optimization - Memcached

16
Key-Value Stores
Key Value

Code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}

Key Value

drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
17
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
18
Column Oriented Database

19
Neo4j

20
NoSQL Characteristics
No Predefined Schemas (except for Columnar)
• May insert data without creating a table
• Schema Versions (v1.5, v1.6, v1.7,…)
Rarely Foreign Keys (except for Graph Databases)
• No JOIN operations
• Relationships are not automatically maintained
Eventual Consistency
• Replicated old data eventually replaced by updated data
• Inconsistent data until all replacements are complete

21
NoSQL Database Types
• Document (JSON)
– Schema is continually growing
– Can pre-JOIN records for speed
• Column Oriented Databases (Columnar)
– Quick Aggregate COUNT, AVG, MIN, MAX, SUM
– Sparse Data = Lots of NULL values
• Graph
– Representation of Complex Relationships/JOINs
22
CRUD Operations
Create
Read
Update
Delete

23
SQL CRUD
Create
INSERT INTO table (column1, column2) VALUES (9, 'string');
Read
SELECT column1, column2 FROM table;
Update
UPDATE table SET column2 = 'text' WHERE column1= 9
Delete
DELETE FROM table WHERE column2='text'

24
Database SELECT Statements
Oracle
SELECT * FROM table

MongoDB
db.table.find()

Cassandra (CQL)
SELECT * FROM table

Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”

25
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
26
Document Oriented Database
{ "facutly" :
[
{ {
"_id" : 1, "_id" : 2,
"name" : { "first" : "John", "last" : "Backus" }, "name" : { "first" : "David", "last" : "Williams" },
"contribs" : [ "Fortran", "ALGOL" ], "contribs" : [ "C#", "Java", "PHP" ],
"awards" : [ "awards" : [
{ "award" : "W.W. McDowell Award", { "award" : "Sherman Peabody Award II",
"year" : 1967, "year" : 2095,
"by" : "IEEE Computer Society" }, "location" : "Paris",
{ "award" : "Draper Prize", "by" : "Intergalactic Continuum" },
"year" : 1993, { "award" : "Sherman Peabody Award IX",
"by" : "National Academy of Engineering" } "year" : 2090,
] "location" : “San Francisco",
}, "by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IV",
"year" : 2093,
"location" : “London",
"by" : "Intergalactic Continuum" }

]
}
]
}

27
Document Oriented Database
http://chris.photobooks.com/json/

28
MongoDB Simple Database
{"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"}
{"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"}
{"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"}
{"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"}
{"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"}
{"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"}
{"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"}
{"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"}
{"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"}
{"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"}
{"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"}
{"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"}
{"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"}
{"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"}

{“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”}
{“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”,
“_id”: “95961”}

Source: http://media.mongodb.org/zips.json

29
Document Database
Advantages
• Add new columns of data very easily
• Commonly JOIN’d data is pre-collected
• NULL fields can be skipped
• Many-to-many without helper table
Disadvantages
• Must track schema definitions
• Data integrity is limited
30
MongoDB vs SQL
http://docs.mongodb.org/manual/reference/sql-comparison/
Terminology:
MongoDB <-> RDBMS
Collection <-> Table
Document <-> Row
Column <-> Field

31
MongoDB vs SQL CRUD
SELECT (Read)
db.courses.find() = SELECT * FROM courses
db.courses.find({name: “CIS2120”}) = WHERE name=“CIS2120”
db.courses.count() = SELECT COUNT(*) FROM courses

UPDATE
db.courses.update({name: “Lehi”}, { $set : { “zip” : “11111” } } )

DELETE
db.courses.remove({name: “CIS2120”})

32
MongoDB CRUD
INSERT
db.courses.insert({
name: “CIS2120”,
description: “Database Coding”,
instructor: {
name: “David Williams”,
email: “david.williams@usu.edu”
}
subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”]
})

33
MongoDB JOIN
• All fields for a document are pre-joined for speedy retrieval
• JOINs were not natively supported
• $lookup provides a lightweight JOIN capabilities
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

34
Column Oriented Database

35
Column Oriented Database

36
Column Oriented Database

37
May only see notable improvements
with over 1 million records

38
Cassandra CRUD
CREATE TABLE course (
name text PRIMARY KEY,
instructor text,
maxstudents int
)

INSERT INTO course (name, instructor, maxstudents) VALUES


(‘CIS2120’, ‘Williams’, 28)

SELECT name, instructor FROM course WHERE maxstudents > 20

UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’

DELETE FROM course WHERE name=‘CIS2120’


DELETE name, instructor FROM course WHERE name=‘CIS2120’

39
Cassandra CRUD
CREATE TABLE people (
name text,
email text,
phones map<text, text>
)

INSERT INTO people (name, email, phones)


VALUES (‘John Weeks’, ‘john.weeks@usu.edu’,
{‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’})

UPDATE people SET phones[‘office’] = ‘555-1212’


WHERE email = ‘john.weeks@usu.edu’

40
Neo4j

41
Neo4j – Graph Database
http://www.neo4j.org/learn/try

http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”

42
Neo4j – Graph Database
Game of Thrones:
http://neo4j.com/graphgist/6029850

43
http://neo4j.com/graphgist/c4eab62c-7f5e-4e17-8f75-811d65d83127
http://neo4j.com/graphgist/886c572c-509e-41be-91b5-d74a5ef6d16d
http://neo4j.com/graphgists/?category=health-care-and-science

44
Neo4j
(LUKE {name:"Luke Skywalker"}), (OBI_WAN)-[:KNOWS]->(VADER),
(HAN {name:"Han Solo"}), (LUKE)-[:KNOWS]->(R2D2),
(LEIA {name:"Princess Leia Organa"}), (R2D2)-[:KNOWS]->(C3PO),
(OBI_WAN {name:"Obi Wan Kenobi"}), (LUKE)-[:LIVED_ON]->(TATOOINE),
(YODA {name : "Yoda"}), (HAN)-[:LIVED_ON]->(CORELLIA),
(VADER {name:"Darth Vader"}), (LEIA)-[:LIVED_ON]->(ALDERAAN),
(C3PO {name:"C3PO", droid:true}), (YODA)-[:LIVED_ON]->(DAGOBAH),
(R2D2 {name:"R2D2", droid:true}), (LUKE)-[:DEVOTED_TO]->(JEDI),
(CHEWBACCA {name:"Chewbacca"}), (LUKE)-[:DEVOTED_TO]->(REBELLION),
(TATOOINE {name:"Tatooine", distance:13184}), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE),
(DAGOBAH {name:"Dagobah", distance:15407}), (VADER)-[:DEVOTED_TO]->(SITH),
(JEDI {name:"Jedi"}), (VADER)-[:DEVOTED_TO]->(EMPIRE),
(SITH {name:"Sith"}), (VADER)-[:DEVOTED_TO]->(DARK_SIDE),
(REBELLION {name:"Rebellion"}), (LEIA)-[:DEVOTED_TO]->(REBELLION),
(EMPIRE {name:"Empire"}), (HAN)-[:DEVOTED_TO]->(REBELLION)
(DARK_SIDE {name:"Dark Side"}), …
(LIGHT_SIDE {name:"Light Side"}), https://gist.github.com/peterneubauer/6019125
… http://gist.neo4j.org/?6019125
(LUKE)-[:FRIENDS_WITH]->(HAN),
(LUKE)-[:FRIENDS_WITH]->(LEIA), MATCH y-[r]-other
(HAN)-[:FRIENDS_WITH]->(CHEWBACCA), WHERE y.name='Yoda'
(YODA)-[:TEACHES]->(OBI_WAN), return y.name, type(r), other.name
(YODA)-[:TEACHES]->(LUKE),
45 (OBI_WAN)-[:TEACHES]->(LUKE),
Neo4j CRUD
MATCH (user {name:“Bill"})-[:KNOWS]->(colleague)
WHERE colleague.employer=“LinkedIn”
RETURN user,colleague
ORDER BY colleague.name LIMIT 10

MATCH (n)-[r:LIKES]->(m) RETURN n,r,m


Matches all persons “n” that likes persons “m”

MATCH (n)-[r]->(m) RETURN n,r,m


Matches any relationship between “n” and “m”

46
Neo4j CRUD
UPDATE
edge.weight = 87

DELETE
edge.removeProperty(‘weight’)

http://docs.neo4j.org/refcard/2.0/
http://www.neo4j.org/learn/cypher

47
NoSQL Challenges
• Identify a Problem First not a Solution
– Define the business value before spending
– Yet another solution to maintain
• Define Standards and Best Practices
• Concept Education and Technical Training
• MongoDB Schema Change Tracking
• Heterogeneous Interoperability
• Security is often an add-on rather than native
48
Supplemental Slides
OpenWest 2014 NoSQL
Presentation Recording
https://www.youtube.com/watch?v=057ddu0Xsqk&noredirect=1

Download slides from:: http://bit.ly/bmdjkw


Supplemental Slides
• NoSQL Comparisons Continued
• Terminology
• Consistency, Replication, Performance
• Redis
• Riak
• HBase

51
2012 NoSQL vs RDBMs

52 Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/


2016 NoSQL vs RDBMs

53 Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/


NoSQL Comparison

Take note of patterns:


Recent Release, Open Source, Utilized at High-Volume sites
Variety of Formats:
Key-Value, Wide-Column, Document, Graph
54 http://db-engines.com/en/ranking
NoSQL Comparison

No ANSI SQL Standards, No Predefined Schemas, Replication, Eventual


Consistency, Rarely Foreign Keys, Data Types not required
55
Newer Concepts: Sharding, REST API, JSON, MapReduce
NoSQL
Terminology
and
Concepts

56
Sharding
Partitions
Data distributed across disks

Sharding
Data distributed across servers

57
Map Reduce
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate

2
6
16
8

1+2+3+6+7+8+9=? 36
1
7
20
3
9

Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008


58 http://www.sciencedirect.com/science/article/pii/S0167642307001281
JSON
Subset of JavaScript Object Notation
Similarities to XML method for representing data
Syntax
Name : Value pairs
“salary” : “125000”
Values are: number, string, Boolean, array, object, or NULL
Objects can store Objects, Arrays can store Arrays
Separate pairs by commas
“salary” : “125000”, “gender” : “male”
Curly braces denote objects
{ “salary” : “125000”, “gender” : “male” }
Square brackets denote arrays
“phone” : [”555-1212”, ”555-3344”]
“phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]

59
JSON Example
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
60 http://www.mongodb.com/json-and-bson
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy

Contrived - Stretch Definitions


61
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy

Contrived - Stretch Definitions


62
Consistency Performance

CPR
Pick
Two Redundancy
Consistency
Performance
Redundancy/Replication
63
CPR
Consistency Performance

Spread data across storage or


computer

A B C D Redundancy

64
Consistency

Performance

ABCE ABCE ABCD ABCD

Updates may be
inconsistent across Redundancy
devices

65
ABCD ABCD ABCD ABCD

One Update Locks all


Consistency
Nodes

Performance

Redundancy

66
CRUD
Create
Read
Update
Delete

67
Key-Value Stores
Key Value

code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}

Key Value

drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
68
Redis CRUD
http://redis.io/commands
http://redis.io/topics/data-types-intro
http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-
1/

Redis is an in-memory Key-Value Store which stores:


Strings, Hashes, Lists, Sets, or Ordered sets

Strings: values of strings are concrete and can not be


altered
SET user:jim {lastname: ‘Mathews’, salary: 125000}
GET user:jim

Hashes: allows modification and retrieval of individual


69
values
Redis CRUD
Lists: One-dimensional array with insert, append, pop,
and push
Redis.lpush(‘users:employees’, ‘user:jim’)
redis.mget(redis.lrange(‘users:employess’,0,5))

Sets: lists with no duplicate values (SADD = Set Add)


SADD users:employees jim
SADD users:employees krishna
SMEMBERS employees

Sorted Sets: are sets with an added sorting value


ZADD users:employees 125000 jim
ZADD users:employees 157000 Krishna
70 ZRANGEBYSCORE users:employees 100000 180000
Riak CRUD
Easy to install and configure test cluster
REST Queries

Create/PUT a “course:CIS2120” row


Key Value
course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}

curl –v –X PUT http://localhost:8091/riak/course/CIS2120 \


-H “Content-Type: application/json” \
-d ‘{“name”:”Database Coding”, “days”:”MWF”}’

Read/GET the value for “course:CIS2120”


curl –X GET http://localhost:8091/riak/course/CIS2120
71 curl http://localhost:8091/riak/course/CIS2120
Riak Links
Riak can link on value to key:value to another with a
relationship

curl –v –X PUT http://localhost:8091/riak/student/sorensen \


-H “Content-Type: application/json” \
-H “Link: </riak/course/CIS2120>; riaktag=\”enrolled\”” \
-d ‘{“firstname”:”Conner”}’

This does not automatically create a link from “sorensen“ to


“CIS2120”

72
Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

73
Hbase
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control

74
Hbase
Row key is a unifier for column families.
If row does insert values in a column family no disk
space is utilized within the column family.

Keys are identified by column_family:column_name


text:
revision:author
revision:comment

Write-Ahead Logging
(WAL)
similar to file system
journaling
75
Hbase CRUD
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki

Seven Databases in Seven Weeks, Redmond & Wilson 2012


76
Cassandra Characteristics
Scalable, High-availability Wide-columnar datastore
Peer-to-peer rather than master-slave clusters
Tunable consistency can read/write to a single node,
quorum of nodes or all nodes
Recommends static and dynamic column families
Static column families have contain pre-defined columns
Contact Info: phone, address, email, web
Dynamic families have variable numbers of similar columns
Students enrolled in a course

77
http://visualizer.json2html.com/
Additional JSON Visualizer

78
NoSQL Introduction
• NoSQL is a commonly adopted misnomer
• Typically does not use ANSI SQL
– SQL = Structured Query Language
– Structure exists but is more Flexible
– Queries are performed
– Language is closer to Programming Languages

79
Slides and Feedback at: http://joind.in/11012
NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
80
NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”

81