Вы находитесь на странице: 1из 48

Landscape of

Open Source
Transactional
Storage Engines
Peter Zaitsev
Vadim Tkachenko

http://MySQLPerformanceBlog.com
Aboutus
- Founders Percona Ltd
- M ySQ L Perform ance Focused C onsulting
- http://w w w .M ySQ LPerform anceBlog.com -authors
- W orked forM ySQ L AB foryears
- Peter– lead of“H igh Perform ance G roup”,Vadim his
righthand
- Long tim e M ySQ L users forbunch ofpersonally
involved projects
M ySQ L pluginable architecture
M ySQ L TransactionalEngines
- B DB -Legacy Storage Engine,rem oved in 5.1 not
tested
- InnoDB -“M ostpopular” (The only com m only used)
storage engine by Innobase O y.
- S olidDB -Storage Engine from Solid Inform ation Technology

- PB XT - Storage Engine by SN AP Innovation (Paul McCullagh)

- Falcon - New Storage Engine by MySQL AB, Project lead by


Jim Starkey
- NDB - MySQL Cluster is a whole other beast and not covered
InnoD B
- http://w w w .innodb.com /
- M ature Storage Engine,developm entstarted by H eikki
Tuuriover10 years ago.
- H eikkiw as looking fora w ays to im prove traditional
databases perform ance
- Acquired by O racle in the end of2005
- The only Transactionalstorage engine available in
M ySQ L 5.0 officialrelease
solidD B
- http://w w w .solidtech.com /solidD BforM ySQ L/
- O penSourced in 2006
- Existing Storage Engine technology “integrated”w ith
M ySQ L
- Focused on reliability and M ultiprocessorScalability
- C urrently shipped as production ready.
Prim eBase XT (PBXT)
- http://w w w .prim ebase.com /xt/
- W ritten m ainly by Paul McCullagh since 2005
- Not a port of existing storage engine to MySQL but new writeup

- Uses number of unusual design decisions

- Only 50% transactional

- Focused on efficient BLOB storage

- http://www.blobstreaming.org/
Falcon
- http://dev.m ysql.com /doc/falcon/en/index.htm l
- Based on “N etfrastructure”engine by Jim Starkey
- Purchased by M ySQ L AB in early 2006
- “Lightw eightD esign”
- Focused on Transactionalneeds ofW eb Application,
efficientuse oflarge am ountofm em ory
Design and Behavior
InnoD B design
- M VC C and very efficientrow levellocks
- C lustering by prim ary key,w rite to sam e pages
- non-com pressed secondary indexes w .transaction info
- Single tablespace ortablespace pertable
- Pessim istic locking
- InstantD eadlock detection
- Fuzzy C heckpointing
- “D oubleW rite”forpartialpage w rite protection
InnoD B
- D EAD LO C K detection
- Session 1: BEGIN;
Session 2: BEGIN;
Session 1: UPDATE test SET name=‘random1-1’ WHERE id=1;
Session 2: UPDATE test SET name=‘random2-1’ WHERE id=2;
Session 1: UPDATE test SET name=‘random1-2’ WHERE id=2;
Session 2: UPDATE test SET name=‘random2-2’ WHERE id=1;

- InnoD B detectdeadlock (Error1213)Ins tantly in


second session
- Pessim istic locking:
- U PD ATE the sam e row in tw o concurrenttransaction –
second transaction w aits on C O M M IT/R O LLBAC K in
first
InnoD B Strengths
- Pow erfulM VC C
- G ood perform ance on w ide range ofw orkloads
- G reatStability
- G reatD ata Protection
- Prim ary Key C lustering allow s a lotofoptim izations
- Transaction info in secondary indexes allow fastindex
only scans
- Adaptive H ash indexes and otheradvanced techniques
InnoD B W eaknesses
- Slow D evelopm entpace in recentyears
- Stillhaving scalability issues w ith m ultiple C PU s
- U nscalable Auto-Increm ent,Broken G roup C om m ittake
very long to fix
- Large footprint,especially forsecondary indexes
- Itturns outnotso large as w e com pare
- Stillm essy integration w ith M ySQ L
- H ow do you see how m uch space is free in Innodb
tablespace ?
SolidD B D esign
- M VC C and R ow levellocking
- C lustering by Prim ary Key
- N ew data stored in new pages
- “BonsaiTree”used forM ultiVersioning
- O PTIM ISTIC and PESSIM ISTIC locking specified on
table level
- O nline Backup (N otusable forSlave creation)
- H igh Available sync replication prom ised soon.
solidD B -PESSIM ISTIC
- D EAD LO C K -D EAD LO C K detected in firstSession after
20 sec ofw aiting
- Tim eoutbased deadlocks
- U PD ATE tw o row s – second session w aiton first
solidD B -O PTIM ISTIC
- D EAD LO C K -D EAD LO C K detected in second Session
im m ediately butw ith error1205 – Lock w aittim eoutexceeded
- U PD ATE tw o concurrentrow s:
- SESSIO N 1:BEG IN ;
SESSIO N 2:BEG IN ;
SESSIO N 1:U PD ATE testSET nam e = ‘rnd’W H ER E id=2;
SESSIO N 2:U PD ATE testSET nam e = ‘rnd’W H ER E id=2;
- In Session 2 w e got:
ER R O R 1205 (H Y000):Lock w aittim eoutexceeded;try
restarting transaction
- This is O K forO PTIM ISTIC engines,butm ay cause trouble in
W eb applications.
S olidDB S treng ths and Weaknes s
- Lim ited production usage to really tell
- O utofstorage engines review ed m ostsim ilarin design
to Innodb
- C hoice ofO ptim istic vs Pessim istic is nice forsom e
applications
- N o instantdeadlock detection
- So faravailable as specialdow nload only (noteven a
plugin)
PBXT D esign
- M VC C W ith row levellocking
- “PerD atabase”Transactions
- N o realdurability yet,w eak crash recovery
- O PTIM ISTIC locking
- W rite once,w rite sequentially to log
- N everupdate in place
- D ata cache + Key cache
- EfficientBLO B H andling
PBXT
- D EAD LO C K detected in second session,1213 error
- U PD ATE tw o concurrentrow s – optim istic,
second session:
ER R O R 1020 (H Y000):R ecord has changed since last
read in table 'test2'
PBXT Strengths and W eaknesses
- N otyetcom m only used in production (w e tried butgot
too m any bugs)
- Very good perform ance forsom e w orkloads
- EfficientStorage, close to M yISAM
- Focused on BLO B efficienthandling,extra features like
Blob Stream ing
- Stillm ainly one m an project
- Large ToD o,a lotneeds to be done,including R ecovery
- Potentially large Purging overhead
Falcon D esign
- M VC C , row levellocking (in practice,notin theory)
- PESSIM ISTIC locking
- N otclustered by prim ary key
- R ow cache (cache only row s you need)
- “O ptim al”index traversion
- “D ata C om pression”-N ulls,Em pty Strings
- Alw ays needs to read row data (because ofindex
structure)
Falcon
- D EAD LO C K:

In Session2:
ER R O R 1020 (H Y000):R ecord has changed since last
read in table 'test2'
- Ann H arrison tells Falcon checks cycles in lock graph
periodically ratherthan instantly on row lock w ait
- U PD ATE:
Second session w aits
Falcon Strengths in W eaknesses
- StillAlpha w ith m any bugs – Early to judge
- Very active supportfrom M ySQ L AB
- Fastdevelopm entpace – bugs being fixed quickly,m ajor
perform ance im provem ents during last3 m onths
- G ood integration w ith M ySQ L,ie tables forperform ance
data
- N o Prim ary key clustering orcovering index support
- D ifferentdesign decisions can com plicate m igration from
Innodb (though logicalbehaviorbecam e closer)
There are lies, big lies
and there are
Benchmarks
Benchm arks – things to note
- Benchm arks m ay notbe relevantforperform ance of
yourapplication
- Early versions w e tried for Falcon,PBXT m ay change
theirperform ance properties before production
- There is nottoo m uch experience outw here tuning
Falcon,PBXT and Solid w ith M ySQ L as they are barely
used in production
- W e did less benchm arks than w anted – spenta lotof
tim e fighting/reporting bugs and checking fixes
Benchm arks
- R ead-O nly on typicaltable forw eb-application
- D BT2 – TPC -C em ulation
- D ellD VD Store – em ulation ofe-com m erce site
- Sysbench – O LTP transactions
- Sqlbench -sm alldata set,single user,typicalquery
patterns
Box
- D ellPow erEdge 2950
- C entO S release 4.5
- 4 C PU
m odelnam e :Intel(R )Xeon(R )C PU 5148 @
2.33G H z
stepping :6
cpu M H z :2327.529
cache size :4096 KB
- 16 G B ofR AM
- R AID 10 (6 10K R PM 3.5”SAS hard drives)
M ySQ L Versions
- Yes,this m eans version affects perform ance notonly
storage engine butw e could notgetallstorage engine
w orking w ith sam e M ySQ L version.
- InnoD B and PBXT
5.1.19
- Falcon
6.0.1-alpha,bk tree from 10-Jul
- SolidD B
5.0.41-0073
Engines param eters
- 12 G B ofR AM forbuffers
- InnoD B --innodb_buffer_pool_s ize=12G
--innodb_flus h_method=O_DIR E C T
--innodb-log -file-s ize=100M
- SolidD B --s oliddb-cache-s ize=12G
- Falcon
--falcon_min_record_memory=2G
--falcon_max_record_memory=4G
--falcon_pag e_cache_s ize=8G
- PBXT
pbxt_index_cache_s ize=8G
pbxt_record_cache_s ize=4G
D BT2 C onfiguration D etails
- D BT2
- http://osdldbt.sourceforge.net/
- 10 C oncurrentusers (about2 foreach C PU core and
disk)
- “Zero D elay”to fully load M ySQ L Server
- In 400W configuration reduced available m em ory to 4G
by locking 12G B ofm em ory to have itIO bound.
- Buffersizes w ere reduced to 2G B
D BT2 – 10 w arehouses
18000 17744
- 10 w arehouses,10
17000
16000 clients (datasize ~
15000
14000
700M )
13000
12000 - R esultin N ew O rder
11000
10000 InnoDB Transaction PerM inute,
SolidDB
9000
8000
8209
Falcon
m ore is better
PBXT
7000
6000
6097
- PBXT crashed
5000
4000
3000 - O ld version ofFalcon
2000
1000
had ~1100 N O TPM
0
NOTPM - G reatim provem ent!
D BT2 – 400 w arehouses
- D ata size ~ 29G B Load time
- SolidD B
140 136
130
crashed after336 m ins 120
110
- D id N otdisable logs on 100
SolidD B to have things 90
InnoDB
80
com parable. 70
PBXT
63 Falcom
60
50
40
40
30
20
10
0
Time, min
D BT2,400W ,D ata size
Size of loaded data - Surprizingly large size
45000 from PBXT
42191 41770
40000 38266
- SolidD B – tables w ere
35000
30726
loaded into M yISAM and
30000 then converted to
InnoDB
25000 SolidDB SolidD B
PBXT

- Itw as crashing
20000 Falcon

15000
otherw ise
10000

5000

0
MB
D BT2,400W ,R esults
- PBXT crashed
1200
1100
1105 - R esultin N ew O rder
1000 Transaction PerM inute,
900 m ore is better
800
700 InnoDB
SolidDB
600 Falcon
495
500
400
300
200 178

100
0
NOTPM
D ellD VD Store
- D atasize 18000 17589
17000
M edium 1 GB 16000
15000
2,000,000 C ustom ers 14000
100,000 Products 13000
12000

- Falcon – crashed
11000
10000
9000 InnoDB

- PBXT – a lotoferrors 8000


7000
7594 SolidDB

6000
- R esultin N ew O rders 5000
4000
perm inute,m ore is 3000
better 2000
1000
0
orders per minute
sysbench
- O lderFalcon used in this test. N ew one crashes :(
- C ouple ofR EAD -O N LY queries againsttypicaltable for
W eb-applications – info ofuseraccount:
CREATE TABLE IF NOT EXISTS sbtest (
id int(10) unsigned NOT NULL auto_increment,
name varchar(64) NOT NULL default '',
email varchar(64) NOT NULL default '',
password varchar(64) NOT NULL default '',
dob date default NULL,
address varchar(128) NOT NULL default '',
city varchar(64) NOT NULL default '',
state_id tinyint(3) unsigned NOT NULL default '0',
zip varchar(8) NOT NULL default '',
country_id smallint(5) unsigned NOT NULL default '0',
PRIMARY KEY (id),
KEY `country_id` (country_id,state_id,city)
)
sysbench,read by prim ary key
SELECT name

65000.00

FROM sbtest 60000.00


55000.00
WHERE id=? 50000.00
Innodb
Falcon
45000.00
Innodb and

40000.00
SolidDB
PBXT
Solid have 35000.00
quries / sec

30000.00
sweat spot 25000.00

being 20000.00
15000.00
clustered by 10000.00

PK 5000.00
0.00
1 4 16 64 128 256

clients
sysbench,read by index
200.00 ●SELECT name
175.00 FROM sbtest
150.00 WHERE
125.00 country_id=?
100.00
quries / sec

75.00
Innodb PBXT Excels

Falcon
50.00
SolidDB ●Falcon comes
25.00 PBXT next
0.00
1 4 16 64 128 256

clients
sysbench,read by covered index
250.00
225.00
●SELECT
200.00 state_id
175.00 FROM sbtest
150.00 WHERE
125.00 country_id=?
quries / sec

100.00
Innodb
75.00 Falcon PBXT still

50.00 SolidDB best


25.00 PBXT
0.00 ●Falcon can't
1 4 16 64 128 256 use covered
clients index
sysbench,read by index,LIM IT 20
50000.00 ●SELECT name
45000.00
FROM sbtest
40000.00
35000.00
WHERE
30000.00 country_id=?
25000.00 LIMIT 20
quries / sec

20000.00
Innodb
15000.00 Falcon ●Falcon Does
10000.00 SolidDB not optimize
5000.00 PBXT Limit
0.00
1 4 16 64 128 256
●Innodb
clients Scales
poorly
Sysbench O LTP
- D atasize
100,000,000 row s
~25G B
- U niform distribution
- I/O -bound load
- read /w rite transactions
- R educed available m em ory by locking 12G B ourof
16G B
Sysbench O LTP,tim e to load data
- U sing m ulti-value 3500 3364
3250
IN SER Ts ratherthan 3000 2880
LO AD D ATA IN FILE 2750
2500
- Solid and Falcon are 2250
even slow erthan Innodb 2000 1930 InnoDB
SolidDB
w hich is know n to be 1750
PBXT
1500
slow com pared to 1250 1237
Falcon

M yISAM fordata load. 1000


750
500
250
0
sec
Sysbench O LTP,D atasize
Datasize, varchar vs char - C om parison ofstorages
27.5 26.44 ofcharand varchar
25
23.0323
colum ns in the table
22.5
22.51
20 - Falcon uses dynam ic
17.5 length row s anyw ay
15 14.8 char, GB

12.5
varchar, GB
- PBXT surprisingly has
9.6 sam e huge size in both
10 8.718.71
7.5 cases
5

2.5

0
InnoDB SolidDB PBXT Falcon
Sysbench O LTP,results
I/O bound - M em ory lim ited to 4G B,
50
2G B forbuffers
46.24
45 - Innodb and SolidD B have
40 benefitdue to clustering
by prim ary key
transactions / sec

35
30.14
30
- AllbutFalcon scale w ell
InnoDB
26.11 SolidDB
25
22.33 PBXT forIO bound w orkload
20 19.06 Falcon
w ith this am ountofhard
15 12.77
10.62
drives.
10.3
10
5.8 5.71
5 4.86
3.87

0
1 4 64
clients
Selected sqlbench results
- single operation repeated N times, total time in secs. less
is better

- Operation | 1| 2| 3|
|innodb_|pbxt_fa|soliddb|
alter_table_add (100) | 8.00| 3.00| 32.00|
count (100) | 12.00| 8.00| 28.00|
count_distinct (1000) | 6.00| 8.00| 74.00|
count_distinct_2 (1000) | 11.00| 11.00| 16.00|
count_group_on_key_parts (1000) | 7.00| 10.00| 83.00|
count_on_key (50100) | 70.00| 94.00| 210.00|
delete_all_many_keys (1) | 17.00| 2.00| 28.00|
insert (350768) | 6.00| 5.00| 21.00|
outer_join (10) | 14.00| 7.00| 61.00|
select_key2_return_prim (200000) | 30.00| 29.00| 25.00|
select_many_fields (2000) | 8.00| 6.00| 5.00|
update_big (10) | 18.00| 56.00| 727.00|
update_of_key_big (501) | 19.00| 6.00| 165.00|
update_of_primary_key_many_keys (256| 44.00| 17.00| 55.00|
update_with_key_prefix (100000) | 19.00| 8.00| 10.00|
C onclusion
- Allreview ed storage engines butInnoD B are currently
too unstable forproduction use.SolidD B com es closest.
- InnoD B is stillw innerin m ajority oftests
- Falcon has serve issues w ith LIM IT optim ization and IO
bound scalability
- PBXT and Falcon w in in certain tests
- SolidD B is currently an outsiderin term s ofPerform ance
- N eed to revisitw hen production versions ofallstorage
engines are ready.
The End
- Thanks forcom ing !
- Slides w illbe published at
http://w w w .m ysqlperform anceblog.com /
- Feelfree to approach us w ith yourquestion
- M ySQ L Perform ance O ptim ization C onsulting Available
- http://w w w .m ysqlperform anceblog.com /m ysql-consulting/
Sysbench O LTP,results,char
CPU bound - D atasize com parable
37.5 36.71
w ith m em ory size
35 34.77

32.5
30 29.36
29.1
27.5
transactions / sec

25.11
25
22.5 InnoDB
20.4
20 18.75 SolidDB
17.51 17.27
17.5 PBXT
15.15 Falcon
15 13.81
12.5
10 8.87
7.5
5
2.5
0
1 4 64
clients

Вам также может понравиться