Вы находитесь на странице: 1из 50

Just in time compilation (JIT)

Virgil Palanciuc

Just-in-Time Compilation

Compiler runs at program execution time


Popularized by Java virtual machine implementations Preserve interpretive character portability

Challenge: Minimize the sum of programs compile time and execution time

Continuous Compilation (Idea: Mix interpretation and compilation) Instead of pausing to compile, simply fire up the interpreter In the background, start compiling code, with emphasis on compilation units that we interpret often When code is compiled, jump to the native code instead Smart Just-in-Time

Estimate whether compilation is actually worthwhile Estimate compile time as a function of code size Observe time spent interpreting code If compilation is worthwhile (heuristic), stop interpreting and compile instead No second processor

Javas Programming Environment


Java source code compiler Java bytecode

Java source code compiles to Java bytecode Bytecode is verifiably secure, compact, and platform independent Virtual machine for the target platform executes the program

network (?)

JIT compiler native code native exec output

Java Summary and conclusions

JIT compilation is effective


Reduces overall running time when compared to interpreter Handles dynamic class loading Register allocation extremely important

Pretty good is usually good enough

Simple register allocation schemes are just as good as complex ones

Just-in-time compilation is usually better than interpretation Simple versions of standard algorithms provide most of the benefit

Especially register allocation!

Introducing: dynamic languages


Have been around for quite a while Perl, Javascript, Python, Ruby, Tcl Popular opinions

Unfixably slow Not possible to create good IDE tools Maintenance traps as codebase grows larger

Observation: techniques for creating tools for dynamic languages are similar to those for improving performance

Why are dynamic languages slow?

Lack of effort

Used for scripting i.e. I/O bound.

Hard to compile with traditional techniques


Object & variable type can change Methods can be added/removed Target machine feature mismatches

Example

C method inline C++ dynamic method dispatch more difficult Javascript even more difficult (method lookup in dictionary)

Dynamic = lack of performance?


Warning: Controversial slide Technology has evolved


Widespread belief that Java is just as fast as C++ Javascript Google V8, SpiderMonkey/TraceMonkey

Cultural problem programmers often prefer to micro-optimize

Actual design and system perspective requires more thought, micro-optimization is accessible

Global optimizations always trump benchmarks!


Java: slower than C++ in benchmarks, faster overall, especially on multicore/SMT Ruby on rails 20% faster than Struts, although Ruby way slower than Java

Case study: Javascript

At a glance:

Java-like syntax, prototype based OOP

Not class-based! OOP types: prototype, class/single dispatch, class/multiple dispatch Prototype: Javascript , no class, each object has its own class Single-dispatch: C++, Java (virtual methods; based on this) Multiple dispatch: based on all arguments

Lexical scoping, 1st class functions, closures ECMAScript edition 4: optional types

Ajax caused popularity surge


Sudden focus on improving performance Browser war javascript interpreters get faster and faster

Efficient JIT compilation


Must think differently Trick is to use heuristics


Profile to get probabilistic information Speculate on what happens frequently Apply simple/fast optimization on the frequent dynamic types/values

Inlining -> Polymorphic inline cache Heuristics can apply to the internal representation

TraceMonkey, Firefoxs response to Chromes V8

1st step bytecode execution

Traditional interpreters - abstract syntax tree walkers

parse the program into a tree of statements and expressions. visit the nodes in the tree, performing their operations and propagating execution state

Bytecode interpreters

Eliminate nodes that represent just syntax structure Bring the representation closer to the machine Enable additional optimizations on bytecode (as done on the JVM)

10

Google V8

Google V8 is the JS engine from Google Chrome 3 key areas to V8s performance:

Efficient garbage collection


stop the world GC eliminates synchronization needs, reduces complexity generational GC, 2 generations rapidly delete short-lived objects Computes hidden classes more on next slides V8 skips bytecode, generates machine code On initial execution, determine hidden class Patch inline cache code to use it

Fast property access

Dynamic Machine Code Generation


point.x

# ebx = the point object cmp [ebx,<hidden class offset>],<cached hidden class> jne <inline cache miss> mov eax,[ebx, <cached x offset>]
11

V8 fast property access

C++, Java - faster because an objects class/memory layout is known

fixed offset to access a property typically single memory load to read/write property

JavaScript - properties can be added to/deleted from objects on the fly


Layout unknown apriori typically, hash lookup to find propertys memory location Idea: objects dont really change that much, use hidden classes to cache the memory layout Not new idea, used first in Self (at SUN; 1989! )

Example:

function Point(x, y) { this.x = x; this.y = y; }

12

Hidden class creation

New Point created:

this.x = x;

13

Hidden class creation (contd)_

this.y = y:

14

Hidden class reuse

If another Point object is created:

initially the Point object has no properties so the newly created object refers to the intial class C0. when property x is added, V8 follows the hidden class transition from C0 to C1 and writes the value of x at the offset specified by C1. when property y is added, V8 follows the hidden class transition from C1 to C2 and writes the value of y at the offset specified by C2.

The runtime behavior of most JavaScript programs results in a high degree of structure-sharing using the above approach. Two advantages of using hidden classes:

Property access does not require a dictionary lookup Enables inline caching.

15

Inline caching

Extremely efficient at optimizing virtual method calls

Obj.ToString() must know Obj in order to be able to call ToString

Initially, call site is uninitialized, method lookup is performed On first call, remmember obj type (i.e. ToString address), change to monomorphic

Always perform the call directly, as long as object type does not change If object type chages, switch back to uninitialized

What about this case?


var values = [1, "a", 2, "b", 3, "c", 4, "d"]; for (var value in values) { document.write(value.toString()); }

Solution: keep a limited number of different types (e.g. 2)


switch state from monomorphic to polymorphic If even more object types occur, change to megamorphic and disable inline caching

16

Firefoxs response: TraceMonkey

Incremental improvement over a bytecode interpreter

But does not preclude dynamic code generation

Uses trace trees instead of control flow graph as the internal representation for the program structure

Basically, it watches for commonly-repeated actions, and optimize the hot paths Easy to perform function inlining Easier to perform type inference (e.g. determine whether a+b is string concatenation or number addition) Looping overhead grossly diminished

Firefox also added some polymorphic property caching

But only for prototype objects?

Tracemonkey is based on Adobes Tamarin:Tracing

17

Traditional CFG

18

Trace tree

19

Trace trees (contd)

Constructed and extended lazily Designed to represent loops (performace-critical parts) Anchor (loop header) discovered by dynamic profiling, not static analysis Can include instructions from called methods (inlining) Can have side exits

Restore VM/interpreter state Resume interpretation

20

Trace trees loop nest

21

Trace trees compilation

Optimization greatly simplified, trace is effectively in SSA form just by renaming

With an exception can you see it?

Register allocation

Traverse all traces in reverse recording order

Guarantee that all uses are seen before the definition

Use a simple scheme to allocate registers

Type specialization

Speculate on variable type based on historic info Insert guards to go to interpreted mode if type changes

22

Results

23

How about tools?

Modern IDE expectations: autocomplete, jump-to-definition, browsing, refactoring First hints: syntax

var x = 12.5; var y = { a:1, b:2}; function foo(a, b) { return a+b;}

Next hints: inference


var x = 12.5; var y=x; Can apply standard techniques used for optimization!

Common idioms/ coding style / jsdoc comments Type inference?

24

Tool support for dynamic languages

What you cant solve deterministically, solve probabilistically


Make assumptions based on what you know (e.g., variable types dont change) Monte Carlo methods? Its not the end of the world if youre wrong

How about refactoring/rename?

Java/.NET cant do it perfectly, either! (how about dynamic class loading/reflection? Struts/XML configuration files? DB persistence layers?)

Conclusions:

Still plenty of low-hanging fruit in the area of dynamic language research Can apply optimization theory to IDEs

25

26

Credits

Presentation assembled mainly from


http://code.google.com/apis/v8/design.html http://andreasgal.com/ Steve Yegges speech & Stanford EE Dept Computer Systems Coloqium, May 2008

http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html http://www.cs.berkeley.edu/~jcondit/cs265/expert.html

Jeremy Condit presentation (CS 265 Expert Topic, 23 April 2003)

27

BACKUP SLIDES

28

The Java Virtual Machine


Each frame contains local variables and an operand stack Instruction set

Load/store between locals and operand stack Arithmetic on operand stack Object creation and method invocation Array/field accesses Control transfers and exceptions

The type of the operand stack at each program point is known at compile time

29

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

Computes: c := 2 * (a + b)

30

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0

Computes: c := 2 * (a + b)

31

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0 2

Computes: c := 2 * (a + b)

32

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0 42 2

Computes: c := 2 * (a + b)

33

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0 7 42 2

Computes: c := 2 * (a + b)

34

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0 49 2

Computes: c := 2 * (a + b)

35

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 0 98

Computes: c := 2 * (a + b)

36

Java Virtual Machine (contd)

Example:
iconst 2 iload a iload b iadd imul istore c

a b c

42 7 98

Computes: c := 2 * (a + b)

37

Lazy Code Selection


Introduced by Intel Vtune JIT compiler ([Adl-Tabatabai 98]) Idea: Use a mimic stack to simulate the execution of the operand stack Instead of the actual values, the mimic stack holds the location of the values

38

Lazy Code Selection (contd)

Each operand on the stack is an element from a class hierarchy

Operand

Immediate

Memory

Register

FP Stack

Field

Array

Static

Stack

Constant

39

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


a b c Reg eax Stack 4 Stack 8

40

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


a b c Reg eax Stack 4 Stack 8 Imm 2

41

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


a b c Reg eax Stack 4 Stack 8 Reg eax Imm 2

42

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


a b c Reg eax Stack 4 Stack 8 Stack 4 Reg eax Imm 2

43

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


movl ebx, eax addl ebx, 4(esp) a b c Reg eax Stack 4 Stack 8 Reg ebx Imm 2

44

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


movl ebx, eax addl ebx, 4(esp) sall ebx, 1 a b c Reg eax Stack 4 Stack 8 Reg ebx

45

Lazy Code Selection (contd)

Example: iconst 2 iload a iload b iadd imul istore c


movl addl sall movl ebx, eax ebx, 4(esp) ebx, 1 8(esp), ebx a b c Reg eax Stack 4 Stack 8

46

Lazy Code Selection (contd)

Achieves several results


Converts stack-based architecture to

register-based architecture

Folds computations into more complex x86 instructions

Allows additional optimizations


Strength reduction and constant propagation Redundant load-after-store

Disadvantages

Extra operands spilled to stack after basic blocks

47

Exception Handling

Problem: Exceptions arent thrown often, but they complicate control flow Solution: On-demand exception translation

Maintain a mapping of native code addresses to original bytecode addresses When an exception occurs, look up the original address and jump to the appropriate exception handler Results in less compilation overhead and bigger basic blocks

48

Exception Handling (contd)


Problem: Exceptions are not always rare Solution: Inlining

Eliminate exceptions when possible


try { throw new MyException(); } catch (MyException e) { }

Use method inlining to create more opportunities

49

AIX JDK [Ishizaki 99]


JIT compiler for PowerPC (32-bit RISC) Contributions:


Null check elimination Array bounds check elimination Global common subexpression elimination Type inclusion test optimization Static method call inlining Dynamic method call resolution none of which matter very much

Each optimization is fairly effective

Over 50% of run-time null checks eliminated

But overall effects are relatively small

At most 10% improvement in overall execution time

50

Вам также может понравиться