Вы находитесь на странице: 1из 66

A

Programma)c Introduc)on to
Neo4j

Dr. Jim Webber


Chief Scien)st, Neo Technology
@jimwebber

Roadmap
NOSQL overview
Neo4j
Whats fabulous in 1.3?
Some hacking

Solu)ons architecture and big data

Why NOSQL now?


Driving trends

Trend 1: Data Size


*"""$
!"''($

!)""$
!"""$
')""$
!"'"$

'"""$
)""$
"$

!""#$

!""%$

!""&$

Trend 2: Connectedness
GGG

Onotologies

Informa)on connec)vity

RDFa
Folksonomies
Tagging
Wikis
UGC
Blogs
Feeds
Hypertext
Text
Documents

web 1.0

1990

web 2.0

2000

web 3.0

2010

2020

Trend 3: Semi-structured informa)on


Individualisa)on of content
1970s salary lists, all elements exactly one job
2000s salary lists, we need many job columns!

All encompassing en)re world views


Store more data about each en)ty
Trend accelerated by the decentraliza)on of
content genera)on
Age of par)cipa)on (web 2.0)

Side note: RDBMS performance


Relational database

Salary List

Performance

Requirement of application

Majority of
Webapps

Social network
Semantic Trading

Data complexity

Four NOSQL Categories

Key-Value Stores
Dynamo: Amazons Highly Available Key-
Value Store (2007)
Data model:
Global key-value mapping
Big scalable HashMap
Highly fault tolerant (typically)

Examples:
Riak, Redis, Voldemort

Pros and Cons


Strengths
Simple data model
Great at scaling out horizontally
Scalable
Available

Weaknesses:
Simplis)c data model
Poor for complex data

Column Family (BigTable)


Googles Bigtable: A Distributed Storage
System for Structured Data (2006)
Data model:
A big table, with column families
Map-reduce for querying/processing

Examples:
HBase, HyperTable, Cassandra

Pros and Cons


Strengths
Data model supports semi-structured data
Naturally indexed (columns)
Good at scaling out horizontally

Weaknesses:
Unsuited for interconnected data

Document Databases
Data model
Collec)ons of documents
A document is a key-value collec)on
Index-centric, lots of map-reduce

Examples
CouchDB, MongoDB

Pros and Cons


Strengths
Simple, powerful data model (just like SVN!)
Good scaling (especially if sharding supported)

Weaknesses:
Unsuited for interconnected data
Query model limited to keys (and indexes)
Map reduce for larger queries

Graph Databases
Data model:
Nodes with proper)es
Named rela)onships with proper)es
Hypergraph, some)mes

Examples:
Neo4j (of course), Sones GraphDB, OrientDB,
InniteGraph, AllegroGraph

Pros and Cons


Strengths
Powerful data model
Fast
For connected data, can be many orders of magnitude
faster than RDBMS

Weaknesses:
Sharding
Though they can scale reasonably well
And for some domains you can shard too!

Social Network path exists


Performance
Experiment:
~1k persons
Average 50 friends per
person
pathExists(a,b)
limited to depth 4
Caches warm to
eliminate disk IO

# persons query ,me


Rela)onal
database

1000

2000ms

Neo4j

1000

2ms

Neo4j

1000000

2ms

What are graphs good for?

Recommenda)ons
Business intelligence
Social compu)ng
Geospa)al
MDM
Systems management
Web of things
Genealogy
Time series data
Product catalogue
Web analy)cs
Scien)c compu)ng (especially bioinforma)cs)
Indexing your slow RDBMS
And much more!

Neo4j is a Graph Database


So we need to detour through a lille
graph theory

Meet Leonhard Euler


Swiss mathema)cian
Inventor of Graph
Theory (1736)

hlp://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

On maturity of data models


300
250
200
150
100
50
0

Most NoSQL Stores


RDMBS
Graph Stores

Property Graph Model

Property Graph Model


ITH
W
_
S
L
TRAVE

LOVES
ITH
W
_
S
L
TRAVE

Property Graph Model


ITH
W
_
S
L
TRAVE

LOVES
ITH
W
_
S
L
TRAVE
first name: Rose
late name: Tyler

vehicle: tardis
model: Type 40

name: the Doctor


age: 907
species: Time Lord

Graphs are very whiteboard-friendly

This has huge design implica)ons

What you
end up with

What you
know

Your awesome new


graph database

hlp://talent-dynamics.com/tag/sqaure-peg-round-hole/

Schema-less Databases
Graph databases dont excuse you from
design
Any more than dynamically typed languages
excuse you from design

Good design s)ll requires eort


But less dicult than RDBMS because you
dont need to normalise
And then de-normalise!

Neo4j

Whats Neo4j?
Its is a Graph Database
Embeddable and server
Full ACID transac)ons
We dont mess around with durability, ever.

Schema free, bolom-up data model design

More on Neo4j
Neo4j is stable
In 24/7 opera)on since 2003

Neo4j is under ac)ve development


High performance graph opera)ons
Traverses 1,000,000+ rela)onships / second on
commodity hardware

Whats new in 1.3?

32B nodes/rels and 64B proper)es


Compact footprint (short string)
New index API
New visualisa)on tool
Bigger graph algo library
Dijkstra for shortest paths

Beler REST API

Big news: License Changes in 1.3


onwards
Community: GPL
The core graph db func)onality, including server,
Webadmin tool
Free as in beer

Advanced: AGPL/commercial
Management features, commercial grade support

Enterprise: AGPL/commercial
HA

NOSQL is simply

Not Only SQL


So how do we query it?

Image credit: hlp://browsertoolkit.com/fault-tolerance.png

Image credit: hlp://browsertoolkit.com/fault-tolerance.png

Image credit: hlp://browsertoolkit.com/fault-tolerance.png

How do I use it?

Gesng started is easy


Single package download, includes server stu
hlp://neo4j.org/download/

For developer convenience, Ivy (or whatever):


<dependency org="org.neo4j" name="neo4j-community" rev="1.3"/>

Run it!
Server is easy to start stop
cd <install directory>
bin/neo4j start
bin/neo4j stop

Provides a REST API in addi)on to the other


APIs weve seen
Provides some ops support
JMX, data browser, graph visualisa)on

Embed it!
If you want to host the database in your
process just load the jars
And point the cong at the right place on disk
Embedded databases can be HA too
You dont have to run as server

Crea)ng Nodes
GraphDatabaseService db = new
EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("name", "the Doctor");
tx.success();
} finally {
tx.finish();
}

Crea)ng Rela)onships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("name", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,
DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally {
tx.finish();
}

Repeat...un)l

The Enemy of my Enemy is my?


Node theMaster =
Node dalek =
Node cyberman
Traverser traverser = Traversal.description().expand(
Traversal.expanderForTypes(DoctorWhoUniverse.ENEMY_OF,
Direction.OUTGOING))
.depthFirst().evaluator(new Evaluator() {
public Evaluation evaluate(Path path) {
// Only include if we're at depth 2, for enemy-of-enemy
if(path.length() == 2) {
return Evaluation.INCLUDE_AND_PRUNE;
} else if(path.length() > 2){
return Evaluation.EXCLUDE_AND_PRUNE;
} else {
return Evaluation.EXCLUDE_AND_CONTINUE;
}
}
}).uniqueness(Uniqueness.NODE_GLOBAL)
.traverse(theMaster);

Graph Algorithms
The Doctor and the Master been around for a
while
But whats the key feature of their
rela)onship?
Theyre both )melords, they both come from
Gallifrey, they pilot a Tardis, theyve fought

Graph algorithms can help


Theyre pre-canned, well known traversals

Shortest Path
Whats the most direct path between the
Doctor and the Master?

Node theMaster =
Node theDoctor =

int maxDepth = 5;
PathFinder<Path> shortestPathFinder =
GraphAlgoFactory.shortestPath(
Traversal.expanderForAllTypes(),
maxDepth);
Path shortestPath =
shortestPathFinder.findSinglePath(theDoctor, theMaster);

Graph matching
Its super-powerful to look for palerns in a
data set
E.g. retail analy)cs

Higher-level abstrac)on than raw traversers


You do less work!

In which episodes did the Doctor


balle the Cybermen?

Sesng up and matching a palern


final PatternNode theDoctor = new PatternNode();
theDoctor.setAssociation(universe.theDoctor());
final PatternNode anEpisode = new PatternNode();
anEpisode.addPropertyConstraint("title", CommonValueMatchers.has());
anEpisode.addPropertyConstraint("episode", CommonValueMatchers.has());
final PatternNode aDoctorActor = new PatternNode();
aDoctorActor.createRelationshipTo(theDoctor, DoctorWhoUniverse.PLAYED);
aDoctorActor.createRelationshipTo(anEpisode, DoctorWhoUniverse.APPEARED_IN);
aDoctorActor.addPropertyConstraint("actor", CommonValueMatchers.has());
final PatternNode theCybermen = new PatternNode();
theCybermen.setAssociation(universe.speciesIndex.get("species",
"Cyberman").getSingle());
theCybermen.createRelationshipTo(anEpisode, DoctorWhoUniverse.APPEARED_IN);
theCybermen.createRelationshipTo(theDoctor, DoctorWhoUniverse.ENEMY_OF);
PatternMatcher matcher = PatternMatcher.getMatcher();
final Iterable<PatternMatch> matches = matcher.match(theDoctor,
universe.theDoctor());

Koans

Because if you do not know it yourself, it does you no good


https://github.com/jimwebber/neo4j-tutorial

Ops and Big Data


Neo4j in Produc)on, in the Large

Scaling graphs is hard

Black Hole server

Chaly Network

Minimal Point Cut

So how do we scale Neo4j?

Neo4j Logical Architecture


REST API

Java

Ruby

Clojure

JVM Language Bindings

Traversal Framework
Core API
Caches
Memory-Mapped (N)IO
Filesystem

A Humble Blade
Blades are powerful!
A typical blade will contain 128GB memory
We can use most of that

If O(dataset) O(memory) then were going to


be very fast
Remember we can do millions of traversals per
second if the caches are warm

Cache Sharding
A strategy for coping with large data sets
Terabyte scale

Too big to hold all in RAM on a single server


Not too big to worry about replica)ng it on disk

Use each blades main memory to cache part


of the dataset
Try to keep caches warm
Full data is replicated on each rack

Consistent Rou)ng

Domain-specic sharding
Eventually (Petabyte) level data cannot be
replicated prac)cally
Need to shard data across machines
Remember: no perfect algorithm exists
But we humans some)mes have domain
insight

Summary
Neo4j 1.3 community edi)on is free as in beer
Graphs are extremely expressive for modeling
Neo4j is fast at graph traversals
No more mul)-join woes
No more insane indexes
No more map reduce

Other language bindings exist :-)

Ques)ons?
hlp://neo4j.org

@jimwebber

Вам также может понравиться