Certifying compilation (certificerende oversættelse)

Background

The purpose of code certification is to enable safe, efficient execution of not-so-trusted code in a more trusted environment. Typical examples include running applets and/or plug-ins downloaded from the net in a web browser or mobile phone, or running user-written device drivers, schedulers, or similar code in the context of an operating-system kernel.

The traditional approach to safe execution is a combination of restricting the language (so that only safe programs can be expressed) and sandboxing (checking all potentially unsafe operations at runtime). For example, in the Java Virtual Machine, mobile code is expressed in a specialized, object-oriented bytecode language, while operations such as array indexing are checked at runtime. However, to achieve good performance, this approach requires sophisticated implementation technology in the code client (e.g., an optimizing JIT compiler), which is at odds with keeping the trusted computing base small and easily auditable.

A recent trend in language technology is therefore to require all untrusted code to be equipped with a machine-checkable certificate, guaranteeing that the code satisfies the trusted environment's safety policy, such as never accessing memory outside of some preallocated space. If the code certificate checks out, the code can be run natively, without extra runtime checks; otherwise, the code is summarily rejected.

Unlike the simpler notion of code signing (which merely guarantees that the code has been produced or approved by the signing entity, such as a program, person, or organization), a certificate actually encodes a formal proof that the code has the claimed property. The certificate can therefore be verified independently, without placing any trust in the original certifier. Moreover, for many properties of interest, verifying the certificate can be done very quickly, and by a piece of code much simpler than a typical JIT compiler. This is important, since the verification has to be performed by the client, which may have limited computing resources.

Instead, most of the hard work is shifted to the code producer (i.e., the programmer and/or the compiler), which now has to not only produce the code, but also certify its safety. It turns out, however, that these tasks often go hand in hand: if the generated code is in fact safe, it is often easy to formally say why. And the compiler can always trivially ensure that the code is safe, by including the necessary runtime checks itself -- unless it can prove that they are unnecessary. For example, if it has already been checked that an array index is within bounds, or that a pointer is non-null, it is not necessary to recheck this on on every reference; instead, the proof can just point back to the original check.

For more background on proof-carrying code, see George Necula's web page on the topic.

Project suggestions

While developing a practical certifying compilation system for a realistic language is clearly too large a task for a bachelor's project, it is entirely feasible to construct non-trivial scaled down systems, or individual components of such systems. Examples of such projects include:

Build a model proof-carrying code tool chain -- including the certifying compiler and proof checker -- for sufficiently simple source and target language, and safety property.
Develop certification systems for particular properties: memory safety (the program does not read/write data outside its allocated space), control safety (the code does not jump to potentially unsafe program addresses), concurrency safety (the code does not access some region of memory without holding a lock for that region), resource safety (the code, such as an interrupt handler, always runs within a bounded amount of time or stack space), etc.
Develop systems for constructing and representing sophisticated certificates (automated theorem proving and checking). For example, for eliminating array bounds checks, the compiler often needs to reason about the properties of integer arithmetic (e.g., if x <= y and y < 10 then x+y < 20), or related structures (such as 32-bit arithmetic or IEEE floating-point arithmetic). Likewise, representing the deduction steps used to derive a result like the above, requires agreement between the code producer and client on what constitutes a rigorous formal proof.

The project will typically consist of a combination of implementation work and more theoretical investigations, e.g., proving that any program certified according to some particular system will indeed execute safely. However, the weighting of the parts can depend largely be the students' interests and backgrounds. Students are most welcome to present their own ideas for projects, or may be inspired by some previous student projects within the area.

Although it is not a formal requirement, it is recommended that students considering a project on certified compilation have already passed the DIKU Compilers course (OV), and one of Programming Languages (Programmeringssprog) or Logic in Computer Science (Logik i datalogi).

Formalities

The topic is well suited for group projects, e.g., with some members concentrating on the compilation part, while others work primarily on the certificate-verifier (which is smaller and simpler, but safety-critical). However, single-person groups are also feasible, especially for more theoretically oriented projects.

[Vejledningen kan naturligvis foregå på dansk, ligesom projektrapporten valgfrit kan skrives på dansk eller engelsk. Ovenstående beskrivelse er affattet på engelsk af hensyn til evt. udenlandske gæstestuderende.]

Andrzej Filinski

Last modified: Wed Aug 22 13:33:27 CEST 2007