Nosql Ug Paris Imdg in Action With Coherence Without Transactions Chapter 110525091639 Phpapp02

Transactions chapter will be presented during another session
In Memory Data Grid in Action with Oracle Coherence for Paris NoSQL User Group
Cyrille Le Clerc
Wednesday, May 25, 2011
Speaker
@cyrilleleclerc blog.xebia.fr Cyrille Le Clerc
Large Scale In Memory Data Grid
Open Source
(Apache CXF, ...)
you build it, you run it

2
Once upon a time...
3
On the Financial side

Needs within nancial market :
- Released Coherence in 2001 - Started as a distributed cache
Very low latency Rich queries & transactions Scalability
- Released Gigaspaces XAP in 2001 Data consistency - Started as a data grid
4
Lets dene an In Memory Data Grid ...
5
Lets define an In Memory Data Grid
eXtreme Scale
This is an In Memory Data Grid

6
This is Network Attached Memory

7
Similarities with NoSQL document oriented

Partitioned, distributed Hastable, schema-less, value is not opaque, scale-out scalability
Very fast
In memory (persistence coming), business logic inside the data
Consistent and Available

Transactional, redundant
Written in Java, data are POJOs

Not necessary
Clients in Java, Microsoft, etc

8
Use cases for this presentation
9
Train Booking System
trains, stations, seats, booking and passengers
10
eCommerce Web Site

warehouse & customers shopping carts
231 canon-eos: 1 ipod : 1 headphone : 1 iphone: 1 ...
311
ipad : 1 iphone: 1 2 barbie : 1 iphone: 1 cabbage-doll: 1

{ "name": "Barbie Computer", "stock": 637, "weigth" : 200 }
121 264
637 12
warehouse stocks
11
In Memory Data Grids Key Principles
12
Store Everything in a Mainframe !
3 To of RAM 80 x 5.2 GHtz cores Much more than $1,000,000
http://ibm.com/
IBM z11
13
Spread on Inexpensive Servers
http://ibm.com/
Mainframe
http://1userverrack.net/
Cheap Servers !
14
Partition Data
Partition gamma
Small servers
Partition beta
MainFrame Partition alpha
Partition for scalability

15
Duplicate Data
sync synchronization
Master Partition alpha Standby Backup
Duplicate data for high availability
16
Data Access Patterns
17
This is not traditional Java EE coding style ! Can apply very complex business logic inside the
data
Stored Procedures Style
Change management challenge !

18
Pattern : Targeted Operation
19
Pattern: Targeted Operation
{ "train-id": "tgv-3071-20110512", "time" : 2011/05/12 12:15, "departure" : "Paris", "arrival" : "Marseille", "seats" : 3, }
Search Trains Partition gamma
Search Trains
train-id is indexed
Search Trains
Partition beta
Partition alpha
Book Train Tickets

20
Pattern : Map Reduce Style Operation
21
Pattern: Map Reduce
{ "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, "seats" : 3, }
Search Trains Partition beta
Search Trains Partition alpha
Distributed Search Train Ticket

22
Pattern: Map Reduce
{ "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15" }
#NONE# }
Search Trains Partition beta

{ "Paris -> Lyon -> Marseille : 12:40" }

23
Pattern: Map Reduce
Search Trains
{ "Paris -> Marseille : 12:15", "Paris -> Lyon -> Marseille : 12:40", "Paris -> Marseille : 13:15" }
Partition beta

24
This is not traditional Java EE coding style

Change management
Dont forget Map Reduce = Distributed Table

Scan
Use Indexes
25
CAP Theorem & In Memory Data Grids
26
CAP Theorem and In Memory Data Grid
Consistency
Only 2 of these 3 properties can be achieved at any given moment in time

Brewers Conjecture
Availability Partition Tolerance
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
27
CAP Theorem and In Memory Data Grid
Data Grids
Consistency
Only 2 of these 3 properties can be achieved at any given moment in time

Brewers Conjecture
Availability Partition Tolerance
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
28
Cross Data Center Data Consistency
London New York Tokyo
World wide replication for financial market

29
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 } {
West Coast
}
"name": "Barbie Computer", "stock": 147, "weigth" : 200
East Coast
Warehouse stocks
30
set stock to 146
West Coast
}
propagation delay !
East Coast
31
set stock to 146
West Coast
}
East Coast
reconciliation API needed !
set weight 175
32
set stock to 146
West Coast
}
East Coast
Network partitioning
set weight 175
33
Data Modeling
34
Data Modeling
Dominant Question Driven Design

Opposite to Relational which is Domain Driven Design
Constrained Tree Schema

Because RPC matters
Denormalized
Due to dominant questions and CTS
35
Data Modeling
Seat number price Train code type TrainStop date
Booking reduction
Passenger name
TrainStation code name
Typical relational data model
36
Data Modeling
Partitioning ready entities tree
e Root
ntity
Seat number price
Booking reduction
Passenger name
Train code type TrainStop date TrainStation code name

Du pli R ca efer ted en in ce d ea ch ata gri dn
od
Find the root entity and denormalize
37
Data Modeling
Remove unused data
Seat number price Train code type TrainStop date booked
Booking reduction
Passenger name
Partitioned Replicated
38
Data Modeling
Train code type
Seat number price booked
TrainStop date
Partitioned Replicated
Data Grid Ready data structure

39
Data Modeling is Hard !
40
Account number
Account number
from CashWitdrawal date amount MoneyTransfer id date amount
to CashWitdrawal date amount
Two root entities for the same MoneyTransfer !

41
Account number
Account number
CashWitdrawal date amount
MoneyTransferIn id date amount
MoneyTransferOut id date amount
Split MoneyTransfer
42
Account number
Account number
Split MoneyTransfer
43
Account number
Data Grid Ready data structure

44
Grid Internals
45
Data Serialization
Used for data transfer and byte oriented storage

Must support evolvable data structure
Hot topic like Apache Thrift, Apache Avro, Google

Protocol Buffer
46
Data Storage
Store Java Beans in the grid

No need to unmarshall for inprocess operations Beware of garbage collector !
Store byte arrays in the grid

Pay unmarshalling at each read and write Low-level / byte-oriented APIs to read data Slightly more garbage collector friendly
47
Communication Protocols
UDP Multi Cast (Coherence, Gigaspaces) TCP/IP (Websphere eXtreme Scale)
48
Topology
Partitions made of shards : 1 primary + 0..*

backups)
Dynamic shards location (changes at runtime and

at restart)
Can use dedicated directory servers or embed it

in the data nodes
49
JVM and Memory
Many editors recommend tiny 1.4 Go JVM !

Garbage collector hell
More than ten JVM per server

Management hell
More and more IMDG support large heaps

50
APIs
51
Raw Java Mapping with Oracle Coherence

public class Train extends AbstractEvolvable implements PortableObject { enum Type { HIGH_SPEED, NORMAL } /** Key of the Cache */ String code; /** Indexed */ String name; Type type; List<Seat> seats = new ArrayList<Seat>(); int version; List<TrainStop> trainStops = new ArrayList<TrainStop>(); @Override public int getImplVersion() { return 1; } @Override public void readExternal(PofReader pofReader) throws IOException { this.code = pofReader.readString(0); this.name = pofReader.readString(1); this.type = (Type) pofReader.readObject(2); pofReader.readCollection(3, this.seats); pofReader.readCollection(4, this.trainStops); this.version = pofReader.readInt(5); } @Override public void writeExternal(PofWriter pofWriter) throws IOException { pofWriter.writeString(0, this.code); pofWriter.writeString(1, this.name); pofWriter.writeObject(2, this.type); pofWriter.writeCollection(3, this.seats, Seat.class); pofWriter.writeCollection(4, this.trainStops, TrainStop.class); pofWriter.writeInt(5, this.version); } }
Train code type
TrainStop date
hand-coded serialization JUnit is your friend !
52
JPA Style Mapping with Websphere eXtreme Scale

@Entity(schemaRoot=true) public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version; ... }
code type TrainStop date
Train
sub entities can have cross relations

53
Map API with Oracle Coherence

NamedCache trainCache = CacheFactory.getCache("train-cache"); /** Save */ void persist(Train train) { trainCache.put(train.getCode(), train); } /** Find by key */ Train findByCode(String code) { return (Train) trainCache.get(code); } /** Find by Query Language */ Train findByTrainName(String name) { Filter filter = QueryHelper.createFilter("name = :name" , Collections.singletonMap("name", name)); Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter); if (trainEntrySet.isEmpty()) { return null; } else { return trainEntrySet.iterator().next().getValue(); } }
Map API
54
JPA Style with Websphere eXtreme Scale

/** Save */ void persist(Train train) { entityManager.persist(train); }
/** Find by key */ Train findByCode(String code) { return (Train) entityManager.find(Train.class, code); }
/** Query Language */ Train findByTrainName(String name) { Query q = entityManager.createQuery("select t from Train t where t.name=:name"); q.setParameter("name", name); return (Train) q.getSingleResult(); }
JPA Style Entity Manager

55
Creating Indexes
Map reduce (without index) = Distributed Table Scan !
56
Indexes with Oracle Coherence
class Train { String name; Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } } { ...
NamedCache trainCache = CacheFactory.getCache("train-cache"); trainCache.addIndex(new ReflectionExtractor("getName"), false, null); trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);
57
Indexes with Websphere eXtreme Scale

@Entity(schemaRoot=true) class Train { @Index @Basic String name; @Index Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); } } ... eXtreme Scale
Query query = em.createQuery("select t from Train t where t.name=:name"); query.getPlan();
This is an execution plan

for q2 in Train ObjectMap using INDEX on name = ( ?name) filter ( q2.c[0] = ?name ) returning new Tuple( q2 )
58
More APIs
Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data
Serialization / Object to Tuple Mapping API ? Unified API ontop of NoSQL stores ?
59
Data Grid <-> Relational Database Interactions
60
Data Grid <-> Relational Database
Data Grids are In Memory -> we need to persist data on disk !
61
update / insert / delete
select directly modified in DB
62

Data Grid -> Relational Database
backend DB
Highly available write behind queues + SQL batched statements

63

Data Grid -> Relational Database
Train code type
TrainStop date
Constrained Tree Schema <-> Relational Impedance Mismatch

64
DB writes MUST succeed !

Prefer raw SQL rather than reused business logic Denormalize the database Remove the foreign keys, use same PKs in DB and data grid Support unordered SQL statements
Align the database on the Data Grid model !

65

Relational Database -> Data Grid
select * from train where last_modif > ? backend DB
Data Grid Originated Scheduled Refresh

(Oracle System Change Number, etc)
66

Relational Database -> Data Grid
backend DB
(Oracle Database Change Notification, etc)

Database Originated Push JMS = durable subscription
67
In Memory -> prepare for reloading after

maintenance operations !
Need for graceful shutdown with disk persistence
Prepare consistency checkers
68
Transactions
69
We didnt have the time to talk about transaction. Another session is planned at Paris No SQL User Group for this.
70
Lets go live !
71
Data Grids and Operations
Standard packaging?
Do It Yourself (layout, scripts, etc)
Limited Management
Do It Yourself (stop/start, detecting data loss, etc)
Limited debugging tools

Do It Yourself (debugging consoles, troubleshooting agents)
JVM pandemia
Dozens of JVM to manage !
72
Data Grids and Operations
Dev / Ops collaboration is required Experts only !
73
The right tool for the right job
74
The right tool for the right job
Incredibly fast ! Even with transactions ! Scalable

If you solve the data loading issue
Good at data replication (when it implements it)

Reconciliation api, etc
Very geeky on both dev and ops side

Not an enterprise grade data store Requires very skilled people + change management
Quite expensive
75
Questions / Answers
?
76

Nosql Ug Paris Imdg in Action With Coherence Without Transactions Chapter 110525091639 Phpapp02

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Nosql Ug Paris Imdg in Action With Coherence Without Transactions Chapter 110525091639 Phpapp02

Загружено:

Авторское право:

Доступные форматы

Transactions chapter will be presented during another session

Wednesday, May 25, 2011

@cyrilleleclerc blog.xebia.fr Cyrille Le Clerc

Large Scale In Memory Data Grid

you build it, you run it

Once upon a time...

On the Financial side

- Released Coherence in 2001 - Started as a distributed cache

Very low latency Rich queries & transactions Scalability

- Released Gigaspaces XAP in 2001 Data consistency - Started as a data grid

Lets dene an In Memory Data Grid ...

Lets define an In Memory Data Grid

This is an In Memory Data Grid

Lets define an In Memory Data Grid

This is Network Attached Memory

Lets define an In Memory Data Grid

Similarities with NoSQL document oriented

Consistent and Available

Written in Java, data are POJOs

Clients in Java, Microsoft, etc

Use cases for this presentation

Train Booking System

trains, stations, seats, booking and passengers

eCommerce Web Site

ipad : 1 iphone: 1 2 barbie : 1 iphone: 1 cabbage-doll: 1

In Memory Data Grids Key Principles

Store Everything in a Mainframe !

3 To of RAM 80 x 5.2 GHtz cores Much more than $1,000,000

Spread on Inexpensive Servers

MainFrame Partition alpha

Partition for scalability

Master Partition alpha Standby Backup

Duplicate data for high availability

Data Access Patterns

Data Access Patterns

Change management challenge !

Pattern : Targeted Operation

Pattern: Targeted Operation

Search Trains Partition gamma

Book Train Tickets

Pattern : Map Reduce Style Operation

Pattern: Map Reduce

{ "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, "seats" : 3, }

Search Trains Partition gamma

Search Trains Partition beta

Search Trains Partition alpha

Distributed Search Train Ticket

Pattern: Map Reduce

{ "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15" }

Search Trains Partition gamma

Search Trains Partition beta

Search Trains Partition alpha

Distributed Search Train Ticket

Pattern: Map Reduce

Search Trains Partition gamma

Search Trains Partition alpha

Distributed Search Train Ticket

Data Access Patterns

This is not traditional Java EE coding style

Dont forget Map Reduce = Distributed Table

CAP Theorem & In Memory Data Grids

CAP Theorem and In Memory Data Grid

Only 2 of these 3 properties can be achieved at any given moment in time

Availability Partition Tolerance