0 оценок0% нашли этот документ полезным (0 голосов)
115 просмотров5 страниц
Modern JVM implementations interleave execution with compilation of "hot" methods to achieve reasonable performance. Compilation overhead impacts the execution time of the application and induces run-time pauses. Compilation Server is the server which compiles and optimizes Java bytecodes on behalf of its clients.
Modern JVM implementations interleave execution with compilation of "hot" methods to achieve reasonable performance. Compilation overhead impacts the execution time of the application and induces run-time pauses. Compilation Server is the server which compiles and optimizes Java bytecodes on behalf of its clients.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате PDF, TXT или читайте онлайн в Scribd
Modern JVM implementations interleave execution with compilation of "hot" methods to achieve reasonable performance. Compilation overhead impacts the execution time of the application and induces run-time pauses. Compilation Server is the server which compiles and optimizes Java bytecodes on behalf of its clients.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате PDF, TXT или читайте онлайн в Scribd
PG Student Professor & Head PRMIT and Research, Badnera PRMIT and Research, Badnera mrudula_nimbarte@rediffmail.com optimizing compilation of “hot” methods at run time Abstract or combine a cheap compiler with an optimizing compiler used only for hot methods. While these Modern JVM implementations interleave combined approaches result in more efficient execution with compilation of “hot” methods to execution of Java programs, they still incur the time, achieve reasonable performance. Since compilation memory, and pause time overhead of optimizing overhead impacts the execution time of the compilation. application and induces run-time pauses, it is better This optimizing compilation overhead is already to offload compilation onto a compilation server. quite significant in desktop systems, and the Compilation server is the server which compiles and situation is worse in devices where one has to optimizes Java bytecodes on behalf of its clients. It balance computing power with other attributes such provides the following benefits: (i) lower execution as battery life. With the advent of mobile computing, and pause times due to reducing the overhead of there is an increasing number of networked (wired or optimization; (ii) lower memory consumption of the wireless) devices with limited CPU speed, small client by eliminating allocations due to optimizing memory size, and/or limited battery life, where the compilation and footprint of the optimizing optimization overhead may be prohibitive. compiler; (iii) lower execution time of the Compilation Server (CS) is a server-assist application due to sharing of profile information mechanism to eliminate or reduce the compilation across different runs of the same application and overhead. CS can compile and optimize code on runs of different applications.. Compilation server is behalf of clients. CS provides the following benefits able to handle more than 50 concurrent clients while compared to the best performing Adaptive still allowing them to outperform best performing configuration: (i) lower execution and pause times adaptive configuration. by reducing the overhead of optimization; (ii) lower . memory management load on the client by eliminating allocation coming from optimizing 1. Introduction compilation and by reducing the footprint of the optimizing compiler; (iii) lower execution time of One of the key features behind the enormous the application due to sharing of profile information success of the Java programming language is its across different runs of the same application and network- and platform-independence: one can runs of different applications. CS can support over compile Java programs into platform neutral 50 concurrent clients while still allowing them to bytecode instructions and ship those instructions outperform the Adaptive configuration. The client across a network to their final destination where a performance is only slightly affected by slow Java Virtual Machine (JVM) executes them network speeds, and that profile-driven [Lindholm and Yellin 1996]. Java’s “write once, run optimizations are feasible in the context of CS. everywhere” paradigm is especially useful for mobile computing, where applications may be downloaded on demand. 2. Motivation There are at least two basic methods of executing Java bytecode instructions: (i) by interpreting the 2.1 Execution Model of Java Virtual bytecode instructions; and (ii) by first compiling the Machines bytecode instructions into native machine code instructions and then executing the resulting native Running Java programs normally consists of two code. The interpretation-only approach often results steps: converting Java programs into bytecode in poor performance because of interpretation instructions (i.e., compiling Java source to .class overheads, while the compilation-only approach files), and executing the resulting class files [Gosling introduces additional run-time overhead because et al. 2000]. Because the compiled class files are compilation occurs at run time. Thus, many JVMs network- and platform-neutral, one can easily ship use hybrid methods: combine interpretation with them across a network to any number of diverse 3. Design and Implementation of clients without having to recompile them. Compilation Server JVMs then execute these class files, and to achieve reasonable performance, state-of-theart The primary design goal of CS is to minimize JVMs, such as HotSpot [Paleczny et al. 2001] and client execution time. CS clients may include Jikes RVM [Burke et al. 1999], also perform desktop PCs, laptops, and PDAs, and thus are likely dynamic native code generation and optimization of to be limited in one form or another compared to CS, selected methods. Instead of using an interpreter, which would be equipped with plenty of memory, Jikes RVM [Arnold et al. 2000] includes a baseline fast disk drives, and fast network connection(s). (non-optimizing) and an optimizing compiler. The Therefore, it would be beneficial to allow the server baseline compiler is designed to be fast, and easy to to perform clients’ work to the extent possible. implement correctly, while the optimizing compiler is designed to produce more efficient machine code 3.1 Overview by performing both traditional compiler optimizations (such as common subexpression Figure 1 shows the overall client and CS elimination) and modern optimizations designed for architecture. Broadly speaking, there are three steps object-oriented programs (such as pre-existence- involved in using CS: based inlining [Detlefs and Agesen 1999]). In (1) Client requests download of class files via CS. addition to providing flags to control individual In this respect, CS acts as a proxy server that optimizations, Jikes RVM provides three relays client’s request to the outside world. optimization levels: O0, O1, and O2. The lower However, unlike a regular proxy server, CS levels (O0 and O1) perform optimizations that are may choose to annotate the class files with fast (usually linear time) and offer high payoff. For profile directives. These directives instruct the example, O0 performs inlining, which is considered client as to which methods or which basic to be one of the most important optimizations for blocks within a method it should instrument object-oriented programs. O2 contains more and gather profile information for profile- expensive optimizations such as ones based on static driven optimizations. single assignment (SSA) form [Cytron et al. 1991]. (2) Client compiles and executes downloaded methods with the baseline compiler. The 2.2 Compilation Pause Times modified adaptive subsystem identifies “hot” methods and sends optimization requests to In addition to affecting overall running time of CS. These requests are accompanied by various applications, dynamic compilation also affects their information. responsiveness because it induces pauses in program (3) The front-end for CS, which we call the CS execution. The use of pause times as a performance driver, stores individual client state data. The metric is popular when evaluating garbage collection driver is also responsible for managing profile algorithms [Cheng and Blelloch 2001], and it should information, reusing optimized code, driving be equally important in evaluating dynamic the optimizing compiler, and sending compilation systems. For example, H¨olzle and optimized machine code back to the clients. Ungar [H¨olzle and Ungar 1994] uses the concept of The installer threads on clients are responsible absolute pause times to evaluate the responsiveness for receiving and installing optimized code. of the SELF programming system. Since CS also acts as a proxy server as depicted 2.3 Memory Usage in Figure 1, make another simplifying assumption, namely that CS has access to all client class files. Performing code optimizations consumes memory Therefore, client requests for method optimization and thus may degrade memory system performance. do not need to send the bytecode instructions to CS. Since Java programs will be optimized at run time, However, since the size of the bytecode instructions there are two memory costs for optimizations: (i) the is usually small, and network speed does not affect data space cost, i.e., the space required by the client performance very much. This assumption is optimizer to run; and (ii) the instruction space cost, not likely to affect client performance. i.e., the footprint of the optimizer. (The final size of optimized code may be larger or smaller than 3.2 Client Design and Implementation unoptimized code, but this size effect is much smaller than the other two.) 3.2.1 Method Selection. One of the most important aspects that affects client performance is selecting Fig 1. Overview of CS-client interaction
callees to CS. The reason for sending state
which methods to optimize on CS. Although it information of m’s callees to CS is because the would be less demanding on clients to have CS select which methods to optimize, having the server optimizing compiler may decide to inline them into choose does not work well because CS does not have m and thus needs to know about their state enough run-time information to make the best information. The collection of client state relies on decision. the baseline compiler, and thus if a callee has not yet While it is more efficient to optimize methods been compiled by the baseline compiler, its state using CS rather than on the client, there still is some information is not sent to CS. CS needs to know cost associated with using the server, such as some things about the state of the client VM. To collection of client state information, the overhead of keep it simple, client state information is sent as communication, and installation of compiled name (string), kind (byte), and value (integer) triples, methods. Optimizing too many methods, i.e., too except for profile information. This representation is eagerly, may increase this cost so significantly that it not optimized for size but for simplicity. It is will offset any benefit gained by optimization. In any possible for CS and the client to agree upon a more case it is impractical to optimize all methods on CS compact representation such as an index into the since it would overload the server. On the other constant pool table in a class file. The size of client hand, optimizing too few hot methods may eliminate state information is small and does not impact client any performance benefit. performance much. To avoid resending information, Thus the benefit model remains constant CS maintains per-client state information. throughout client execution. The cost model is dynamic. The cost is expressed in terms of the 3.2.3 Mode of Communication. Another important number of bytecodes compiled per unit time, and design dimension that can affect client performance each individual client maintains their own cost is the way it communicates with CS. Both the model. amount of transmitted data and the mode of communication (synchronous or asynchronous) can 3.2.2 Client State. Once a method m is identified as impact how clients perform. Offloading compilation being “hot” using our modified cost-benefit model, to a server means that clients can continue executing the adaptive controller issues a compilation request unoptimized machine code while they are waiting for to CS (step 2 in Figure 1). In addition to sending a optimized code to arrive from the server. It may be description of m, the adaptive controller also collects beneficial to wait for the server for long running and sends client state corresponding to m and its methods since waiting and executing optimized code may be shorter than executing unoptimized code followed by optimized code. Furthermore, the may later be recompiled using a higher optimization problem with long running methods can also be level, if that is estimated to be beneficial. Since using addressed by performing on-stack replacement [Fink the same multi-level recompilation strategy in CS and Qian 2003] of compiled methods. The use of may result in higher server load and increased asynchronous communication also has the benefit of network traffic. These optimizations have high reducing client pause times since a client needs to potential for benefit with low to moderate pause only to send the request and to receive and compilation cost. install compiled code. Split transaction is to split an optimization request in two transactions as described 3.3.2 Speculative Optimizations. One may need to below. The idea behind split transaction is reduce apply some speculative optimizations, such as wait time for optimized code by sending the request guarded inlining, in order to generate best quality as early as possible. In the first transaction, client code. However, speculative optimizations make would identify “getting-warm” methods and send some assumptions about the state of the world, and requests for these methods to CS. In the second these assumptions need to hold for correct execution transaction, the client would identify “actually-hot” of programs. The Adaptive system maintains and methods and send requests for these methods to CS. checks the validity of these assumptions, and when Since the set of methods in “gettingwarm” would these assumptions are violated, it re-optimizes the include most, if not all, of the methods in “actually- method in question. The same approach is used in hot”, CS would simply send previously optimized the CS setting: in order to accommodate speculative code back to the client without additional delay. In a optimizations, the server sends a list of class names setting where CS is more capable and relatively idle, and method descriptions back to the client that affect or in situations where optimization or analysis is the validity of a compiled method. The client is then much more expensive, split transactions may be responsible for checking that these classes and effective at reducing client wait time. methods are not sub classed and overridden, respectively. Even though CS has access to all the 3.2.4 Security and Privacy. Any client-server classes a client may execute, CS cannot perform architecture is affected by security and privacy these checks since it does not know which classes a issues. Questions such as “How can clients trust client may load dynamically in the future. CS could CS?” or “What kinds of privacy policies should be assume that any class that a client has access to could implemented in CS?” are interesting but beyond the potentially be sub classed and its methods scope of study. Assume that clients can trust CS to overridden, but that would result in code that is too send correctly optimized code and that CS can trust conservative. If a client finds that these assumptions its clients to send valid profile information. Also are violated, the server sends non-speculatively assume that other appropriate security and privacy optimized code to the client upon request. policies are in place, which is a reasonable assumption to make if both client and server are 3.3.3 Inlining. One of the most effective behind a firewall, for example. optimization method is inlining [Lee et al. 2004], and CS sometimes is not as aggressive as the 3.3 Server Design and Implementation Adaptive system in inlining, due to its limited knowledge about client states. In order for CS to As shown in Figure 1, CS implementation inline a method m1 into another method being consists of a driver that mediates between clients and compiled, m2, CS needs to have full information the optimizing compiler. The CS driver receives and about m1. Unless m1 has been baseline compiled on stores per-client state information, segregated by the client already, it is not possible for CS to know kind so as to speed lookups by the optimizing about its state. Because clients ask only for “hot” compiler. It is also responsible for combining profile methods to be compiled by CS, there is a high information received from its clients. The CS driver probability that most of the frequent method also drives the optimizing compiler and relays invocations contained within these “hot” methods optimized code back to the clients. In addition to have already been executed (and therefore compiled machine code, it also sends GC maps, inline maps, by the Baseline compiler) at least once, and thus all bytecode maps, and the list of classes that must not of the frequent call sites can be inlined by CS. The be subclassed and methods that must not be fact that CS cannot inline as aggressively as the overridden back to its clients. optimizing compiler in Opt or Adaptive may seem like a disadvantage, but in fact, it is not, because the 3.3.1 Optimization Selection. The Adaptive system filtering of call sites that happens on clients works as allows multi-level recompilation using the various a primitive feedback-directed adaptive inlining optimization levels. For example, a method which mechanism. Due to this filtering, CS does not bother has been recompiled using optimization level O0 to inline infrequently executed call sites; inlining them may result in worse instruction cache behavior. 5. References 3.3.4 Linking. While producing generic machine Arnold, M., Fink, S., Grove, D., Hind, M., and Sweeney, P. 2000. Adaptive optimization in the Jalape˜no JVM. In Proceedings of code that can be used on different clients has its the Conference on Object-Oriented Programming, Systems, benefits, it can be costly because each individual Languages, and Applications.ACM Press, Minneapolis, MN, client must link the machine code before executing 47–65. it. Therefore CS produces optimized machine code Burke, M., Choi, J.-D., Fink, S., Grove, D., Hind, M., Sarkar, V., Serrano, M., Sreedhar, V. C.,and Srinivasan, H. 1999. The that clients can simply download, install, and execute Jalape˜no dynamic optimizing compiler for Java. In ACM Java without any modification. For example, if a static Grande Conference. ACM Press, San Francisco, CA, 129–141. field is referenced in a method, the server will use Cheng, P. and Blelloch, G. E. 2001. A parallel, real-time garbage the offset of the corresponding static field obtained collector. In Proceedings of the ACM SIGPLAN 2001 from the client to generate code for that particular Conference on Programming Language Design and Implementation. ACM Press, Snowbird, UT, 125–136. method. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Adeck, F. K. 1991. Efficiently computing static single assignment form 3.3.5 Compiled Code Reuse. The issue of code and the control dependence graph. ACM Transactions on reuse is to reduce compilation time and server load Programming Languages and Systems (TOPLAS) 13, 4, 451– 490. even further. To reuse code, CS makes a scan of Detlefs, D. and Agesen, O. 1999. Inlining of virtual methods. In generated machine code to find patch points: patch Proceedings of the 13th European Conference on Object- points are name (string) and machine code offset Oriented Programming. Springer-Verlag, Lisbon, Portugal, (integer) pairs that describe which bytes have to be 258–278. Fink, S. J. and Qian, F. 2003. Design, implementation, and patched before compiled code can be delivered to evaluation of adaptive recompilation with onstack replacement. another client. When a request for the same method In Proceedings of the International Symposium on Code comes in, CS simply walks over the list of patch Generation and Optimization.IEEE Computer Society, San points modifying the compiled code as necessary. Francisco, California, 241–252. Gosling, J., Joy, B., Steele, G., and Bracha, G. 2000. Java However, it is sometimes not enough to patch Language Specification, Second Edition: The Java Series. compiled code in this manner because there may be Addison-Wesley Longman Publishing Co., Inc., Boston, MA. other differences between clients. For example, H¨olzle, U. and Ungar, D. 1994. A third-generation SELF depending on whether a static field referenced in a implementation: reconciling responsiveness with performance. In Proceedings of the Ninth Annual Conference on Object- method has been resolved on the client, the server oriented Programming Systems, Language, and Applications. may need to generate additional code to resolve the ACM Press, Portland, OR, 229–243. field before accessing it. In the case where clients’ Lee, H. B., Von Dincklage, D., Diwan, A., and Moss, J. E. B. states differ, the server simply re-optimizes the 2004. Understanding the behavior of compiler optimizations. Tech. Rep. CU-CS-972-04, University of Colorado, Dept. of method in question. Another approach would be to Computer Science, CB 430, Boulder, CO 80309-0430. Mar. generate lowest common denominator code that Lee, H. B., Von Dincklage, D., Diwan, A., and Moss, J. E. B. could be used universally across different clients, but 2007. Design, implementation, and evaluation of a compilation such code would perform relatively poorly due to server. ACM Trans. Program. Lang. Syst. 29, 4, Article 18 ( August 2007 ). additional checks. Lindholm, T. and Yellin, F. 1996. The Java Virtual Machine . Specification. Addison-Wesley, Boston, MA. 4. Conclusion Plezbert, M. P. and Cytron, R. K. 1997. Does ”just in time” = ”better late than never”? In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming To achieve reasonable performance, many Languages. ACM Press, Paris, France, 120–131. modern virtual machines resort to Just-in-time compilation. However, the cost of dynamic Prof. A.P.Bodkhe is Professor & Head of compilation even in state-of-the-art virtual machines Information Technology Department at PRMIT & Research, Badnera and Dean, Faculty of is moderately high. A compilation server compiles Engineering & Technology at Sant Gadge Baba client code at the granularity of methods to reduce or Amravati University, Amravati. He did B.E. eliminate the cost associated with dynamic (EXTC) from Karnataka Uni.,Dharwad in 1984 compilation. Compilation server is able to handle and M.E.(Adv.Electronics) from Amravati University,Amravati in 1990. He has got 24 more than 50 concurrent clients while still allowing years of teaching experience. them to outperform best performing adaptive configuration. CS is also effective at reducing clients Mrs. M. S. Nimbarte did B.E.(CSE) from GCOE end-to-end execution times, pause times, and Amravati and currently working as a lecturer since last six years in Bapurao Deshmukh college memory consumption. Being able to migrate of Engineering, Sevagram and doing part time compilation onto a remote server using CS approach M.E. from PRMIT and Research, Badnera. Her will have significant impact on the way virtual area of interests are Compilers, System Software, machines and optimizations are designed and Theory of Computation and Computer Architecture and Organization. implemented.