Академический Документы
Профессиональный Документы
Культура Документы
Dennis // @mdennis
No OpenJDK No Blackdown (anyone still use this?) Etc, etc, etc; just use the Sun (Oracle) JVM At least u22, but in general the latest release (unless you have specific reasons otherwise)
Don't put the commit log and data directories on the same set of spindles
commit log gets a single spindle entirely to itself (standard consumer SATA disks easily sustain > 80 MB/s in sequential writes)
SSDs have no seek time EC2 ephemeral drives are still virtualized (but not the same as EBS) On EC2 or SSDs: use one RAID set for both the commit log and data directories
Not predictable freezes are common Throughput limited in many cases Stripe them Both commit log and data directory on the same raid set
6 8 GB is good (assuming sufficient ram on your boxen) 10 12 GB is possible and in some circumstances correct 16GB == max JVM heap size > 16GB => badness JVM heap ~= boxen RAM => badness (always)
GC Suckage
~16GB
~10GB ~6GB
Timeout / failure => entire mutation must be retried => wasted work Larger mutations => higher likely hood of timehood 1000 mutations to perform? Do 100 batches of 10 in parallel instead of one batch of 1000 Exact number or rows/batch is variable depending on HW, network, load, etc; experiment! (10-100 is a good starting point)
Creates hot spots Requires baby sitting from ops Not as well tested nor is it widely deployed
Always specify your initial token. Auto select doesn't do what you think it does nor does it do what you want
loadbalance is even worse, it doesn't currently do what you think, what you want or what it claims; F#@* my cluster would be a much more apt name than loadbalance Future (next?) release of OPSC will remove your balancing woes
Super Columns
10 15 percent performance penalty on reads and writes Easier / better to use to composite columns
0.8.x makes this a lot easier Done manually in 0.7.x and is still better
Devs working in C* code despise (loathe?) them API probably won't be deprecated, but implementation will be replaced behind the seen with composites (may be ok at that point to use them, but should probably just use composite API direclty) Cassandra and DataStax is committed to maintain the API going forward, even if the implementation changes
Race conditions Abuses/Thrashes cache (row, key and page) Increases latency Increases IO requirements (by a lot) Increases size in the client
Winblows
Easier to get help (IRC, email, meetups, etc) C* performs better Better tested Cheaper Wider deployed (by a lot)
Q?
Cassandra Anti-Patterns Matthew F. Dennis // @mdennis