Rabbit: A Compiler for Scheme/Chapter 3

←

Rabbit: A Compiler for Scheme
by Guy L. Steele, Jr.

Chapter 3

→

1177432Rabbit: A Compiler for Scheme — Chapter 3Guy L. Steele, Jr.

3. The Target Language The "target language" is a highly restricted subset of MacLISP, rather than any particular machine language for an actual hardware machine such as the PDP-10. RABBIT produces MacLISP function definitions which are then compiled by the standard MacLISP compiler. In this way we do not need to deal with the uninteresting vagaries of a particular piece of hardware, nor with the peculiarities of the many and various data-manipulation primitives (CAR, RPLACA, +, etc.). We allow the MacLISP compiler to deal with them, and concentrate on the issues of environment and control which are unique to SCHEME. While for production use this is mildly inconvenient (since the code must be passed through two compilers before use), for research purposes it has saved the wasteful' reimplementation of much knowledge already contained in the MacLISP compiler.

On the other hand, the use of MacLISP as a target language does not by any means trivialize the task of RABBIT. The MacLISP function-calling mechanism cannot be used as a target construct for the SCHEME function call, because MacLISP's function calls are not guaranteed to behave tail-recursively. Since tail-recursion is a most crucial characteristic distinguishing SCHEME from most LISP systems, we must implement SCHEME function calls by more primitive methods.

Similarly, since SCHEME is a full-funarg dialect of LISP while MacLISP is not, we cannot in general use MacLISP's variable-binding mechanisms to implement those of SCHEME. On the other hand, it is a perfectly legitimate optimization to use MacLISP mechanisms in those limited situations where they are applicable.

Aside from ordinary MacLISP data-manipulation primitives, the only MacLISP constructs used in the target language are FROG, GO, RETURN, PROGN, COND, SETQ, and ((LAMBDA ...) ...). PROG is never nested; there is only a single, outer PROG. RETURN is used only in the form (RETURN NIL) to exit this outer PROG; it is never used to return a value of any kind. LAMBDA-expressions are used only to bind temporary variables. In addition, CONS, CAR, CDR, RPLACA, and RPLACD are used in the creation and manipulation of environments.

We may draw a parallel between each of these constructs and an equivalent machine-language (or rather, assembly language) construct:

PROG A single program module.

GO A branch instruction. PROG tags correspond to instruction labels.

RETURN Exit from program module.

PROGN Sequencing of several instructions.

COND Conditional branches, used in a disciplined manner. One may think of

(COND (pred1 value1) 
      (pred2 value2)
      ...
      (predn va1uen))

as representing the sequence of code

       <code for pred1> 
       JUMP-IF-NIL reg1,TAG1 
       <code for value1> 
       JUMP ENDTAG 
TAG1:  <code for pred2> 
       JUMP-IF-NIL regl,TAG2 
       <code for value2>
       JUMP ENDTAG
TAG2:  ...
       <code for predn> 
       JUMP-IF-NIL reg1,TAGn 
       <code for valuen> 
       JUMP ENDTAG 
TAGn:  LOAD-VALUE NIL 
ENDTAG:

which admits of some optimizations, but we shall not worry about this. (The MacLISP compiler does, but we do not depend at all on this fact.)

SETQ Load register, or store into memory. LAMBDA we use this primarily in the form:

((LAMBDA (ql ... qn) 
         (setq varl ql) 
         (setq varn qn)) valuel ...

which we may think valuen) of as saving values on a temporary stack and then popping them into the variables:

<code for valuel>    ;leaves result in regl 
PUSH regl
...
<code for valuen> 
PUSH regl 
POP varn 
...
POP varl

This is in fact approximately how the MacLISP compiler will treat this construct. This is used to effect the simultaneous assignment of several values to several registers. It would be possible to do without the MacLISP LAMBDA in this case, by using extra intermediate variables, but it was decided that this task was less interesting than other issues within RABBIT, and that assignments of this kind would occur sufficiently compiler to produce The form ((LAMBDA .. ) ..) user wrote such a LAMBDA-body are all CONS is used, among environment. While often that it was desirable to get the MacLISP the best possible code in this case.

CONS CONS is also used in some situation where the form in the SCHEME code, and the arguments and "trivial", in a sense to be defined later. other things, to "push" new values onto the current SCHEME variables can sometimes be represented as temporary MacLISP variables using LAMBDA, in general they must be kept in a "consed environment" in the heap; CAR and CDR are used to "index" the environment "stack" (which is not really a stack, but in general treelike). (N.B. By using CONS for this purpose we can push the entire issue of environment retention off onto the LISP garbage collector. It would be possible to use array-like blocks for environments, and an Algol-like "display" pointer discipline for variable access. However, a retention strategy as opposed to a deletion strategy must be used in general, because SCHEME, unlike Algol 60 and 68, permits procedures to be the values of other procedures.

Stack allocation does not suffice in general -- a heap must be used.

Later we will see that RABBIT uses stack allocation of environments and a deletion strategy in simple cases, and reverts to heap allocation of environments and a retention strategy in more complicated situations.)

CAR. + Primitive MacLISP operators such as + and CAR are analogous to machine-language instructions such as ADD and LOAD-INDEXED. We leave to the MacLISP compiler the task of compiling large expressions involving these; but we are not avoiding the associated difficult issues such as register allocation, for we shall have to deal with them in compiling calls to SCHEME functions.