Javascript Optimization and Just in Time Compilation

Just in time compilation (JIT)
Virgil Palanciuc
Just-in-Time Compilation
Compiler runs at program execution time

Popularized by Java virtual machine implementations Preserve interpretive character portability
Challenge: Minimize the sum of programs compile time and execution time
Continuous Compilation (Idea: Mix interpretation and compilation) Instead of pausing to compile, simply fire up the interpreter In the background, start compiling code, with emphasis on compilation units that we interpret often When code is compiled, jump to the native code instead Smart Just-in-Time
Estimate whether compilation is actually worthwhile Estimate compile time as a function of code size Observe time spent interpreting code If compilation is worthwhile (heuristic), stop interpreting and compile instead No second processor
Javas Programming Environment

Java source code compiler Java bytecode
Java source code compiles to Java bytecode Bytecode is verifiably secure, compact, and platform independent Virtual machine for the target platform executes the program
network (?)
JIT compiler native code native exec output
Java Summary and conclusions
JIT compilation is effective

Reduces overall running time when compared to interpreter Handles dynamic class loading Register allocation extremely important
Pretty good is usually good enough
Simple register allocation schemes are just as good as complex ones
Just-in-time compilation is usually better than interpretation Simple versions of standard algorithms provide most of the benefit
Especially register allocation!
Introducing: dynamic languages

Have been around for quite a while Perl, Javascript, Python, Ruby, Tcl Popular opinions

Unfixably slow Not possible to create good IDE tools Maintenance traps as codebase grows larger
Observation: techniques for creating tools for dynamic languages are similar to those for improving performance
Why are dynamic languages slow?
Lack of effort
Used for scripting i.e. I/O bound.
Hard to compile with traditional techniques

Object & variable type can change Methods can be added/removed Target machine feature mismatches
Example

C method inline C++ dynamic method dispatch more difficult Javascript even more difficult (method lookup in dictionary)
Dynamic = lack of performance?

Warning: Controversial slide Technology has evolved

Widespread belief that Java is just as fast as C++ Javascript Google V8, SpiderMonkey/TraceMonkey
Cultural problem programmers often prefer to micro-optimize
Actual design and system perspective requires more thought, micro-optimization is accessible
Global optimizations always trump benchmarks!

Java: slower than C++ in benchmarks, faster overall, especially on multicore/SMT Ruby on rails 20% faster than Struts, although Ruby way slower than Java
Case study: Javascript
At a glance:
Java-like syntax, prototype based OOP
Not class-based! OOP types: prototype, class/single dispatch, class/multiple dispatch Prototype: Javascript , no class, each object has its own class Single-dispatch: C++, Java (virtual methods; based on this) Multiple dispatch: based on all arguments
Lexical scoping, 1st class functions, closures ECMAScript edition 4: optional types
Ajax caused popularity surge

Sudden focus on improving performance Browser war javascript interpreters get faster and faster
Efficient JIT compilation

Must think differently Trick is to use heuristics

Profile to get probabilistic information Speculate on what happens frequently Apply simple/fast optimization on the frequent dynamic types/values
Inlining -> Polymorphic inline cache Heuristics can apply to the internal representation
TraceMonkey, Firefoxs response to Chromes V8
1st step bytecode execution
Traditional interpreters - abstract syntax tree walkers
parse the program into a tree of statements and expressions. visit the nodes in the tree, performing their operations and propagating execution state
Bytecode interpreters

Eliminate nodes that represent just syntax structure Bring the representation closer to the machine Enable additional optimizations on bytecode (as done on the JVM)
10
Google V8

Google V8 is the JS engine from Google Chrome 3 key areas to V8s performance:
Efficient garbage collection

stop the world GC eliminates synchronization needs, reduces complexity generational GC, 2 generations rapidly delete short-lived objects Computes hidden classes more on next slides V8 skips bytecode, generates machine code On initial execution, determine hidden class Patch inline cache code to use it
Fast property access
Dynamic Machine Code Generation

point.x
# ebx = the point object cmp [ebx,<hidden class offset>],<cached hidden class> jne <inline cache miss> mov eax,[ebx, <cached x offset>]
11
V8 fast property access
C++, Java - faster because an objects class/memory layout is known
fixed offset to access a property typically single memory load to read/write property
JavaScript - properties can be added to/deleted from objects on the fly

Layout unknown apriori typically, hash lookup to find propertys memory location Idea: objects dont really change that much, use hidden classes to cache the memory layout Not new idea, used first in Self (at SUN; 1989! )
Example:
function Point(x, y) { this.x = x; this.y = y; }
12
Hidden class creation
New Point created:
this.x = x;
13
Hidden class creation (contd)_
this.y = y:
14
Hidden class reuse
If another Point object is created:
initially the Point object has no properties so the newly created object refers to the intial class C0. when property x is added, V8 follows the hidden class transition from C0 to C1 and writes the value of x at the offset specified by C1. when property y is added, V8 follows the hidden class transition from C1 to C2 and writes the value of y at the offset specified by C2.
The runtime behavior of most JavaScript programs results in a high degree of structure-sharing using the above approach. Two advantages of using hidden classes:

Property access does not require a dictionary lookup Enables inline caching.
15
Inline caching
Extremely efficient at optimizing virtual method calls
Obj.ToString() must know Obj in order to be able to call ToString
Initially, call site is uninitialized, method lookup is performed On first call, remmember obj type (i.e. ToString address), change to monomorphic

Always perform the call directly, as long as object type does not change If object type chages, switch back to uninitialized
What about this case?

var values = [1, "a", 2, "b", 3, "c", 4, "d"]; for (var value in values) { document.write(value.toString()); }
Solution: keep a limited number of different types (e.g. 2)

switch state from monomorphic to polymorphic If even more object types occur, change to megamorphic and disable inline caching
16
Firefoxs response: TraceMonkey
Incremental improvement over a bytecode interpreter
But does not preclude dynamic code generation
Uses trace trees instead of control flow graph as the internal representation for the program structure

Basically, it watches for commonly-repeated actions, and optimize the hot paths Easy to perform function inlining Easier to perform type inference (e.g. determine whether a+b is string concatenation or number addition) Looping overhead grossly diminished
Firefox also added some polymorphic property caching
But only for prototype objects?
Tracemonkey is based on Adobes Tamarin:Tracing
17
Traditional CFG
18
Trace tree
19
Trace trees (contd)
Constructed and extended lazily Designed to represent loops (performace-critical parts) Anchor (loop header) discovered by dynamic profiling, not static analysis Can include instructions from called methods (inlining) Can have side exits

Restore VM/interpreter state Resume interpretation
20
Trace trees loop nest
21
Trace trees compilation
Optimization greatly simplified, trace is effectively in SSA form just by renaming
With an exception can you see it?
Register allocation
Traverse all traces in reverse recording order
Guarantee that all uses are seen before the definition
Use a simple scheme to allocate registers
Type specialization

Speculate on variable type based on historic info Insert guards to go to interpreted mode if type changes
22
Results
23
How about tools?
Modern IDE expectations: autocomplete, jump-to-definition, browsing, refactoring First hints: syntax

var x = 12.5; var y = { a:1, b:2}; function foo(a, b) { return a+b;}
Next hints: inference

var x = 12.5; var y=x; Can apply standard techniques used for optimization!
Common idioms/ coding style / jsdoc comments Type inference?
24
Tool support for dynamic languages
What you cant solve deterministically, solve probabilistically

Make assumptions based on what you know (e.g., variable types dont change) Monte Carlo methods? Its not the end of the world if youre wrong
How about refactoring/rename?
Java/.NET cant do it perfectly, either! (how about dynamic class loading/reflection? Struts/XML configuration files? DB persistence layers?)
Conclusions:

Still plenty of low-hanging fruit in the area of dynamic language research Can apply optimization theory to IDEs
25
26
Credits
Presentation assembled mainly from

http://code.google.com/apis/v8/design.html http://andreasgal.com/ Steve Yegges speech & Stanford EE Dept Computer Systems Coloqium, May 2008
http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html http://www.cs.berkeley.edu/~jcondit/cs265/expert.html
Jeremy Condit presentation (CS 265 Expert Topic, 23 April 2003)
27
BACKUP SLIDES
28
The Java Virtual Machine

Each frame contains local variables and an operand stack Instruction set

Load/store between locals and operand stack Arithmetic on operand stack Object creation and method invocation Array/field accesses Control transfers and exceptions
The type of the operand stack at each program point is known at compile time
29
Java Virtual Machine (contd)
Example:
iconst 2 iload a iload b iadd imul istore c
Computes: c := 2 * (a + b)
30
Example:
a b c
42 7 0
31
Example:
a b c
42 7 0 2
32
Example:
a b c
42 7 0 42 2
33
Example:
a b c
42 7 0 7 42 2
34
Example:
a b c
42 7 0 49 2
35
Example:
a b c
42 7 0 98
36
Example:
a b c
42 7 98
37
Lazy Code Selection

Introduced by Intel Vtune JIT compiler ([Adl-Tabatabai 98]) Idea: Use a mimic stack to simulate the execution of the operand stack Instead of the actual values, the mimic stack holds the location of the values
38
Lazy Code Selection (contd)
Each operand on the stack is an element from a class hierarchy
Operand
Immediate
Memory
Register
FP Stack
Field
Array
Static
Stack
Constant
39
Example: iconst 2 iload a iload b iadd imul istore c

a b c Reg eax Stack 4 Stack 8
40

a b c Reg eax Stack 4 Stack 8 Imm 2
41

a b c Reg eax Stack 4 Stack 8 Reg eax Imm 2
42

a b c Reg eax Stack 4 Stack 8 Stack 4 Reg eax Imm 2
43

movl ebx, eax addl ebx, 4(esp) a b c Reg eax Stack 4 Stack 8 Reg ebx Imm 2
44

movl ebx, eax addl ebx, 4(esp) sall ebx, 1 a b c Reg eax Stack 4 Stack 8 Reg ebx
45

movl addl sall movl ebx, eax ebx, 4(esp) ebx, 1 8(esp), ebx a b c Reg eax Stack 4 Stack 8
46
Achieves several results

Converts stack-based architecture to
register-based architecture
Folds computations into more complex x86 instructions
Allows additional optimizations

Strength reduction and constant propagation Redundant load-after-store
Disadvantages
Extra operands spilled to stack after basic blocks
47
Exception Handling

Problem: Exceptions arent thrown often, but they complicate control flow Solution: On-demand exception translation

Maintain a mapping of native code addresses to original bytecode addresses When an exception occurs, look up the original address and jump to the appropriate exception handler Results in less compilation overhead and bigger basic blocks
48
Exception Handling (contd)

Problem: Exceptions are not always rare Solution: Inlining
Eliminate exceptions when possible

try { throw new MyException(); } catch (MyException e) { }
Use method inlining to create more opportunities
49
AIX JDK [Ishizaki 99]

JIT compiler for PowerPC (32-bit RISC) Contributions:

Null check elimination Array bounds check elimination Global common subexpression elimination Type inclusion test optimization Static method call inlining Dynamic method call resolution none of which matter very much
Each optimization is fairly effective
Over 50% of run-time null checks eliminated
But overall effects are relatively small
At most 10% improvement in overall execution time
50

Javascript Optimization and Just in Time Compilation

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Javascript Optimization and Just in Time Compilation

Загружено:

Авторское право:

Доступные форматы

Just in time compilation (JIT)

Compiler runs at program execution time

Popularized by Java virtual machine implementations Preserve interpretive character portability

Javas Programming Environment

JIT compiler native code native exec output

Java Summary and conclusions

JIT compilation is effective

Pretty good is usually good enough

Simple register allocation schemes are just as good as complex ones

Especially register allocation!

Introducing: dynamic languages

Why are dynamic languages slow?

Used for scripting i.e. I/O bound.

Hard to compile with traditional techniques

Dynamic = lack of performance?

Warning: Controversial slide Technology has evolved

Cultural problem programmers often prefer to micro-optimize

Global optimizations always trump benchmarks!

Case study: Javascript

Java-like syntax, prototype based OOP

Ajax caused popularity surge

Efficient JIT compilation

Must think differently Trick is to use heuristics

TraceMonkey, Firefoxs response to Chromes V8

1st step bytecode execution

Traditional interpreters - abstract syntax tree walkers

Efficient garbage collection

Fast property access

Dynamic Machine Code Generation

V8 fast property access

C++, Java - faster because an objects class/memory layout is known

JavaScript - properties can be added to/deleted from objects on the fly

function Point(x, y) { this.x = x; this.y = y; }

Hidden class creation

New Point created:

Hidden class creation (contd)_

Hidden class reuse

If another Point object is created:

Extremely efficient at optimizing virtual method calls

Obj.ToString() must know Obj in order to be able to call ToString

What about this case?

Solution: keep a limited number of different types (e.g. 2)

Firefoxs response: TraceMonkey

Incremental improvement over a bytecode interpreter

But does not preclude dynamic code generation

Firefox also added some polymorphic property caching

But only for prototype objects?

Tracemonkey is based on Adobes Tamarin:Tracing

Trace trees (contd)

Restore VM/interpreter state Resume interpretation

Trace trees loop nest

Trace trees compilation

Optimization greatly simplified, trace is effectively in SSA form just by renaming

With an exception can you see it?

Traverse all traces in reverse recording order

Guarantee that all uses are seen before the definition

Use a simple scheme to allocate registers

How about tools?

var x = 12.5; var y = { a:1, b:2}; function foo(a, b) { return a+b;}

Next hints: inference

Common idioms/ coding style / jsdoc comments Type inference?

Tool support for dynamic languages

What you cant solve deterministically, solve probabilistically