Dynamic Compilation in Practice: From Instrumentation to Parallelization

Thursday, March 13, 2008 - 11:24am

Chi-Keung (CK) Luk
CE Program Faculty Candidate – Intel

DATE: Monday, Mar 17th
TIME: 9:00 AM
PLACE: Engineering Science Building 2001

As the complexity of computer systems continues to grow, the job of a compiler is increasingly challenging as it strives to generate optimized code for many different runtime environments. One promising approach to addressing this challenge is via dynamic compilation, a technique that compiles a program while it runs and thus has the complete runtime information. Despite its potential, the success of dynamic compilation has been largely limited to providing architectural compatibility—the best known example is the translation of Java bytecodes to native machine codes. Looking forward, I argue that dynamic compilation should be more heavily used in the software stack. In this talk, I present two new usages of dynamic compilation originated from my research at Intel, namely instrumentation and parallelization.

In the first part, I will present the Pin instrumentation system, which has become very popular for developing architectural and program analysis tools. By using dynamic compilation to insert instrumentation codes on the fly, Pin can perform very fine-grain monitoring of the architectural state of a program. And by providing a simple-to-use API, programmers can easily write a variety of tools ranging from cache simulators to memory-leak checkers to data-race detectors. I will discuss the dynamic compilation techniques behind Pin. In addition, I will present an extension of Pin called PinOS, which performs whole-system instrumentation (i.e. including both OS and applications) by using a novel combination of dynamic compilation and virtualization techniques.

In the second part, I will present the Qilin parallel programming system, a research prototype that I am building to exploit the hardware parallelism available on machines with a multicore CPU and a GPU. Qilin provides a C++ API for writing data-parallel operations so that the compiler is alleviated from the difficult job of extracting parallelism from serial code. At runtime, the Qilin compiler automatically partitions these API calls into tasks and intelligently maps these tasks to the underlying hardware. Preliminary results show that our parallel system can achieve significant speedups (above 10x) over the serial case for some important computation kernels.

At the end, I will outline my future works.

Chi-Keung (CK) Luk is currently a Senior Staff Researcher in the Software Pathfinding and Innovation Group at Intel, where he conducts research and advanced development in parallel programming, dynamic compilation, computer architecture, program analysis tools, and virtualization. Most recently, he has founded the Qilin parallel programming system project and the PinOS whole-system instrumentation project. He was also a core developer of both the Pin dynamic instrumentation system and the Ispike Itanium binary optimizer.

CK obtained his Ph.D. from the University of Toronto, under the supervision of Todd Mowry. He also spent two years as a visiting scholar at Carnegie Mellon University. He has over 20 publications and one issued patent with another five pending. He has served on the program committees of WBIA’05, MSP’02, and MICRO’01.

Among the honors CK received, he is most proud of the Intel Achievement Award—the most prestigious award at Intel—he received in 2008 for his contributions to Pin, and the nomination for the ACM Doctoral Dissertation Award in 2000.

HOST: Malgorzata Marek-Sadowska