Tutorials at ICS '99



Tutorial 1:

Java for High-Performance Computing

TUTORIAL ANNOUNCEMENT
June 19, 1999

Morning Session

Java programming and JIT-Compilation of Java for Intel Architecture

Aart Bik, Milind Girkar, Mohammad Haghighat
Microcomputer Research Labs
Intel Corporation

The High-Performance Java Tutorial at ICS'99 will take place on June 19, 1999 in conjunction with ICS'99. The day is dedicated to two tutorials on Java programming, compilation and system support. The tutorials will discuss several optimization techniques that improve the performance of Java programs.

Target audience: Java VM/JIT developers, application programmers. Assumed background: Java language, compilation techniques

Part 1: Java programming
length : 1 hour

Introduction
what is Java, history of Java, comparisons with C/C++, sample programs, development environment.
Language
basic types, strings, arrays, objects, statements, operators, classes, packages, input/output, variables, scope rules.
Inheritance, interfaces
virtual functions, superclass dependencies, runtime class loading, interfaces.
Exceptions
error handling, throwable objects, catch or specify requirement, finally clause, exception hierarchy.
Threads
thread context, priorities, synchronization, monitors, wait and notify, thread states, examples.
Java virtual machine
architecture, registers, operand stack, instruction set, garbage collection, verification
Applets, Graphics, Networking
overview

Part 2: Java JIT-compilation
length : 3 hours

Overview
VM-JIT interaction, JIT-GC interaction, bytecode to IR translation, IR to machine IR translation, code generation and formatting
Machine-independent Optimizations
CSE, loop invariant code motion, inlining, speculative inlining, method specialization, multiversion code
Machine-dependent Optimizations
Local/Global register allocation, instruction selection, instruction scheduling, code/data alignment
Java specific Optimizations
Rangecheck elimination, checkcast/instanceof removal, type propagation, optimizations enabled due to strong typing
Garbage Collection
Overview (Precise, imprecise, copying, generational, incremental), analysis required for precise GC
Vectorization
String operations, supporting Intel MMX Technology
Supporting performance analysis tools
Intel VTune

ABSTRACT

The Java model achieves security and platform independence by translating and distributing applications in Java bytecodes, the instructions of Java Virtual Machine. In the early releases of Java Development Kit, bytecodes were interpreted by the JVM. As Just-In-Time compilation technology has matured, JIT compilers have now become integral parts of JVM implementations. By compiling bytecodes into the host-machine instruction set and optimizing the code on the fly, the performance of Java applications is significantly improved, even matching the performance of statically compiled languages. Given that compilation takes place during the execution time of an application, JIT-compiler designers are faced with a major challenge of identifying cost effective optimizations and implementing them in an efficient manner.

In this tutorial, a team from Intel's Microcomputer Research Labs that has developed a Java JIT-compiler for the Intel Architecture presents their experiences and results. We start with an overview of the Java language and the components of a Java environment. We give detailed descriptions of Java constructs that have direct impact on compilation. Covered topics include classes and objects, interfaces, threads, exception handling, and garbage collection. Next, we give a comprehensive and in-depth presentation about the structure and optimizations of the state-of-the-art JIT-compiler that we have developed. This includes JVM-JIT and JIT-GC interactions, intermediate languages for translating bytecodes for Intel Architecture, optimization techniques at various levels, exception handling, and garbage collection. We further discuss a spectrum of high-level optimizations such as CSE, loop-invariant code-motion, elimination of range checks and cast checks, inlining, speculative inlining, method specialization, multiversion code, optimizations enabled due to strong typing, and hardware-assisted exception-handling. We also present the techniques and heuristics that we used for low-level optimizations such as local and global register allocation, instruction selection and scheduling, and code and data alignment. Moreover, we share our experiences with advanced vectorization techniques in support of Intel's MMX Technology, as well as a demonstration of VTune, Intel's performance analysis tool that assisted us in our optimizations design.


Afternoon Session

Static and Dynamic Optimized Compilation of Java programs

Vivek Sarkar and Manish Gupta
IBM T.J. Watson Research Center

Target Audience: Java VM and compiler developers.
Assumed Background: Java programming language, basic compiler technology
Length: 4 hours

Outline of the tutorial:

Part 1: Dynamic optimizations
length: 2 hours
Introduction
Performance of Java programs, comparison with C/C++, Java semantics and their impact on optimizations.
Alias analysis for Java:
Pointer-induced alias analysis for Java, comparison with C/C++.
Optimizations from alias analysis:
Synchronization optimization and stack allocation of heap data.
Speculative optimization:
Optimization of methods with respect to the class hierarchy, specialization of methdods with respect to data values, optimistic assumptions about exceptions, recovery techniques for correctness of speculation.
Part 2: Static optimizations
length: 2 hours
Numerical computing:
Java applicability to numerical computing.
Performance inhibitors:
Exception model, absence of true multidimensional arrays, no complex numbers, restricted floating-point semantics.
Techniques for high-performance numerical computing in Java:
Standard Array package, compiler techniques for creating exception-free regions, semantic expansion, modified floating-pint semantics.

ABSTRACT:

In this tutorial, we describe the state of the art in optimized compilation of Java programs. In the first half of the tutorial, we discuss issues and techniques related to dynamic optimization of general Java programs. As motivation, we show a breakdown of execution time for some Java benchmark programs, and explain why these Java programs run slower than comparable C/C++ programs. There are several aspects of Java semantics that cause complications for traditional optimizations e.g., exceptions, garbage collection, threads, synchronization, and dynamic class loading. We describe aggressive models of these semantics features that allow more code motion and optimization to occur, compared to simpler models of these features. On the positive side, the semantics of pointers (object references) in Java is more retricted than the semantics of pointers in C/C++ programs. We describe algorithms for pointer-induced alias analysis for Java that are more precise and efficient than corresponding algorithms for C/C++, and we discuss two key optimizations for Java --- synchronization optimization and stack allocation of heap data --- that use the results of pointer-induced alias analysis. Finally, we briefly mention some speculative optimizations that are well-suited to dynamic compilation e.g., optimization of methods with respect to the (dynamic) extant class hierarchy, specialization of methods with respect to data values, and optimistic assumptions about exceptions. We also outline recovery techniques that guarantee correctness when speculative assumptions are violated.

In the second half of the tutorial, we discuss techniques for static compilation and optimization of scientific programs written in Java. The growing popularity of Java has led to a great deal of interest in using it for numerically intensive computing, even though it was not originally intended for applications in that domain. We describe the important factors that inhibit performance of scientific programs written in Java, like (i) precise exceptions and mandatory checks for null pointers and out-of-bounds array access violations, (ii) absence of true multidimensional arrays, (iii) lack of support for complex data type, and (iv) restrictions in floating point semantics. We first discuss the solutions that allow high performance, approaching the levels of Fortran and C/C++, to be delivered without compromising on the purity of Java. This is achieved through a combination of using class libraries and a standard array package, and compiler techniques such as creation of exception-free regions and semantic expansion of known method calls. Each of these solutions is discussed in depth, with supporting data based on performance measurements. We shall also discuss proposals that seek to modify elements of the floating point semantics of Java, in order to achieve higher performance.