Академический Документы
Профессиональный Документы
Культура Документы
Virgil Palanciuc
Just-in-Time Compilation
Challenge: Minimize the sum of programs compile time and execution time
Continuous Compilation (Idea: Mix interpretation and compilation) Instead of pausing to compile, simply fire up the interpreter In the background, start compiling code, with emphasis on compilation units that we interpret often When code is compiled, jump to the native code instead Smart Just-in-Time
Estimate whether compilation is actually worthwhile Estimate compile time as a function of code size Observe time spent interpreting code If compilation is worthwhile (heuristic), stop interpreting and compile instead No second processor
Java source code compiles to Java bytecode Bytecode is verifiably secure, compact, and platform independent Virtual machine for the target platform executes the program
network (?)
Reduces overall running time when compared to interpreter Handles dynamic class loading Register allocation extremely important
Just-in-time compilation is usually better than interpretation Simple versions of standard algorithms provide most of the benefit
Have been around for quite a while Perl, Javascript, Python, Ruby, Tcl Popular opinions
Unfixably slow Not possible to create good IDE tools Maintenance traps as codebase grows larger
Observation: techniques for creating tools for dynamic languages are similar to those for improving performance
Lack of effort
Object & variable type can change Methods can be added/removed Target machine feature mismatches
Example
C method inline C++ dynamic method dispatch more difficult Javascript even more difficult (method lookup in dictionary)
Widespread belief that Java is just as fast as C++ Javascript Google V8, SpiderMonkey/TraceMonkey
Actual design and system perspective requires more thought, micro-optimization is accessible
Java: slower than C++ in benchmarks, faster overall, especially on multicore/SMT Ruby on rails 20% faster than Struts, although Ruby way slower than Java
At a glance:
Not class-based! OOP types: prototype, class/single dispatch, class/multiple dispatch Prototype: Javascript , no class, each object has its own class Single-dispatch: C++, Java (virtual methods; based on this) Multiple dispatch: based on all arguments
Lexical scoping, 1st class functions, closures ECMAScript edition 4: optional types
Sudden focus on improving performance Browser war javascript interpreters get faster and faster
Profile to get probabilistic information Speculate on what happens frequently Apply simple/fast optimization on the frequent dynamic types/values
Inlining -> Polymorphic inline cache Heuristics can apply to the internal representation
parse the program into a tree of statements and expressions. visit the nodes in the tree, performing their operations and propagating execution state
Bytecode interpreters
Eliminate nodes that represent just syntax structure Bring the representation closer to the machine Enable additional optimizations on bytecode (as done on the JVM)
10
Google V8
Google V8 is the JS engine from Google Chrome 3 key areas to V8s performance:
stop the world GC eliminates synchronization needs, reduces complexity generational GC, 2 generations rapidly delete short-lived objects Computes hidden classes more on next slides V8 skips bytecode, generates machine code On initial execution, determine hidden class Patch inline cache code to use it
point.x
# ebx = the point object cmp [ebx,<hidden class offset>],<cached hidden class> jne <inline cache miss> mov eax,[ebx, <cached x offset>]
11
fixed offset to access a property typically single memory load to read/write property
Layout unknown apriori typically, hash lookup to find propertys memory location Idea: objects dont really change that much, use hidden classes to cache the memory layout Not new idea, used first in Self (at SUN; 1989! )
Example:
12
this.x = x;
13
this.y = y:
14
initially the Point object has no properties so the newly created object refers to the intial class C0. when property x is added, V8 follows the hidden class transition from C0 to C1 and writes the value of x at the offset specified by C1. when property y is added, V8 follows the hidden class transition from C1 to C2 and writes the value of y at the offset specified by C2.
The runtime behavior of most JavaScript programs results in a high degree of structure-sharing using the above approach. Two advantages of using hidden classes:
Property access does not require a dictionary lookup Enables inline caching.
15
Inline caching
Initially, call site is uninitialized, method lookup is performed On first call, remmember obj type (i.e. ToString address), change to monomorphic
Always perform the call directly, as long as object type does not change If object type chages, switch back to uninitialized
switch state from monomorphic to polymorphic If even more object types occur, change to megamorphic and disable inline caching
16
Uses trace trees instead of control flow graph as the internal representation for the program structure
Basically, it watches for commonly-repeated actions, and optimize the hot paths Easy to perform function inlining Easier to perform type inference (e.g. determine whether a+b is string concatenation or number addition) Looping overhead grossly diminished
17
Traditional CFG
18
Trace tree
19
Constructed and extended lazily Designed to represent loops (performace-critical parts) Anchor (loop header) discovered by dynamic profiling, not static analysis Can include instructions from called methods (inlining) Can have side exits
20
21
Register allocation
Type specialization
Speculate on variable type based on historic info Insert guards to go to interpreted mode if type changes
22
Results
23
Modern IDE expectations: autocomplete, jump-to-definition, browsing, refactoring First hints: syntax
var x = 12.5; var y=x; Can apply standard techniques used for optimization!
24
Make assumptions based on what you know (e.g., variable types dont change) Monte Carlo methods? Its not the end of the world if youre wrong
Java/.NET cant do it perfectly, either! (how about dynamic class loading/reflection? Struts/XML configuration files? DB persistence layers?)
Conclusions:
Still plenty of low-hanging fruit in the area of dynamic language research Can apply optimization theory to IDEs
25
26
Credits
http://code.google.com/apis/v8/design.html http://andreasgal.com/ Steve Yegges speech & Stanford EE Dept Computer Systems Coloqium, May 2008
http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html http://www.cs.berkeley.edu/~jcondit/cs265/expert.html
27
BACKUP SLIDES
28
Each frame contains local variables and an operand stack Instruction set
Load/store between locals and operand stack Arithmetic on operand stack Object creation and method invocation Array/field accesses Control transfers and exceptions
The type of the operand stack at each program point is known at compile time
29
Example:
iconst 2 iload a iload b iadd imul istore c
Computes: c := 2 * (a + b)
30
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0
Computes: c := 2 * (a + b)
31
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0 2
Computes: c := 2 * (a + b)
32
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0 42 2
Computes: c := 2 * (a + b)
33
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0 7 42 2
Computes: c := 2 * (a + b)
34
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0 49 2
Computes: c := 2 * (a + b)
35
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 0 98
Computes: c := 2 * (a + b)
36
Example:
iconst 2 iload a iload b iadd imul istore c
a b c
42 7 98
Computes: c := 2 * (a + b)
37
Introduced by Intel Vtune JIT compiler ([Adl-Tabatabai 98]) Idea: Use a mimic stack to simulate the execution of the operand stack Instead of the actual values, the mimic stack holds the location of the values
38
Operand
Immediate
Memory
Register
FP Stack
Field
Array
Static
Stack
Constant
39
40
41
42
43
44
45
46
register-based architecture
Disadvantages
47
Exception Handling
Problem: Exceptions arent thrown often, but they complicate control flow Solution: On-demand exception translation
Maintain a mapping of native code addresses to original bytecode addresses When an exception occurs, look up the original address and jump to the appropriate exception handler Results in less compilation overhead and bigger basic blocks
48
49
Null check elimination Array bounds check elimination Global common subexpression elimination Type inclusion test optimization Static method call inlining Dynamic method call resolution none of which matter very much
50