Академический Документы
Профессиональный Документы
Культура Документы
Attila-Mihly Balzs http://hype-free.blogspot.com/ Former malware researcher (low-level guy) Current Java dev (high level dude) Spent the last ~6 monts optimizing a large (1 000 000+ LOC) legacy system Will spend the next 6 months on it too (at least )
Question everything!
Core principles Demo 1: collections framework Demo 2, 3, 4: synchronization performance Demo 5: ugly code, is it worth it? Demo 6, 7, 8: playing with Strings Conclusions Q&A
Selecting efficient algorithms High level optimizations (architectural changes) These are important too! (but require more effort, and we are going for the quick win here)
Core principles
Performance is a balence, and endless game of shifting bottlenecks, no silver bullets here!
Your program CPU CPU Memory Memory
Disk Disk
Network Network
Compiler (JIT): 5 to 6: 100%(1) Memory: L1/L2 cache, main memory Disk: cache, RAID, SSD Network: 10Mbit, 100Mbit, 1000Mbit
Until recently we had it easy (performance doubled every 18 months) Now we need to do some work
(1) http://java.sun.com/performance/reference/whitepapers/6_performance.html
Core principles
Measure, measure, measure! (before, during, after). Try using realistic data! Watch out for the Heisenberg effect (more on this later) Some things are not intuitive:
Pop-question: if processing 1000 messages takes 1 second, how long does the processing of 1 message take?
Core principles
Troughput Latency Thread context, context switching Lock contention Queueing theory Profiling Sampling
L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1K bytes with Zippy 10,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Send packet CA->Netherlands->CA 150,000,000 n
(2) http://research.google.com/people/jeff/stanford-295-talk.pdf
Feasability
Amdahl's law: The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Course of action
Have a clear (written?), measourable goal: operation X should take less than 100ms in 99.9% of the cases Measure (profile) Is the goal met? The End Optimize hotspots go to step 2
Tools
Wrong data structure (list / array instead of set), hence slooow performance for large data sets (but not for small ones!) Extra synchronization if used by a single thread only Not actually thread safe! (only exception safe)
Demo 1: lessons
Use existing classes Use realistic sample data Thread safety is hard! Heisenberg (observer) effect
If I have N units of work and use 4, it must be faster than using a single thread, right? What does lock contention look like? What does a synchronization train(wreck) look like?
Demo 2, 3, 4: lessons
ReadWriteLock java.util.concurrent.*
Use realistic sample data (too short / too long units of work) Sometimes throwing a threadpool at it makes it worse! Consider using a private copy of the variable for each thread
Parsing a logfile
Demo 5: lessons
Demo 6: String.substring
Demo 6: Lesson
Demo 7: Lessons
Use a WeakHashMap for caching (don't forget to synchronize!) Use String.equals (not ==)
Demo 8: charsets
Demo 8: lessons
Conclusions
Measure twice, cut once Don't trust advice you didn't test! (including mine) Most of the time you don't need to sacrifice clean code for performant code
Conclusions
Slides:
Source code:
Resources