Rabbit: A Compiler for Scheme/Chapter 12

[Page 100]

90 12. Conclusions and Future Work Lexical scoping, tail-recursion, the conceptual treatment of functions (as opposed to representations thereof) as data objects, and the ability to notate "anonymous" functions make SCHEME an excellent language in which to express program transformations and optimizations. Imperative constructs are easily modelled by applicative definitions. Anonymous functions make it easy to avoid needless duplication of code and conflict of variable names. A language with these properties is useful not only at the preliminary optimization level, but for expressing the results of decisions about order of evaluation and storage of temporary quantities. These properties make SCHEME as good a candidate as any for an UNCOL. The proper treatment of functions and function calls leads to generation of excellent imperative low-level code.

We have emphasized the ability to treat functions as data objects. We should point out that one might want to have a very simple run-time environment which did not support complex environment structures, or even stacks. Such an end environment does not preclude the use of the techniques described here. Many optimizations result in the elimination of LAMBDA-expressions; post CPS-conversion analysis eliminates the need to close many of the remaining LAMBDA-expressions. One could use the macros and internal representations of RABBIT to describe intermediate code transformations, and require that the final code not actually create any closures. As a concrete example, imagine writing an operating system in_SCHEME, with machine words as the data domain (and functions excluded from the run-time data domain). We could still meaningfully write, for example:

[Page 101]

91 (IF (OR (STOPPED (PROCESS I)) (AWAITING-INPUT (PROCESS I))) (SCHEDULE-LOOP (+ I I)) (SCHEDULE-PROCESS I)) While the intermediate expansion of this code would conceptually involve the use of functions as data objects, optimizations would reduce the final code to a form which did not require closures at run time.

An experiment we would like to try would be to use CGOL [Pratt], a program which parses ALGOL-like syntax and produces LISP code, as a front end for RABBIT. The result would be a compiler for an ALGOL-like language which would produce code by the processes of parsing (by CGOL); macro-expansion, optimization, and output of MacLISP code (by RABBIT); and generation of PDP-10 machine language (by the MacLISP compiler).

Among the interesting issues we have not dealt with or have not yet implemented' in RABBIT are: compilation of data manipulation primitives, interaction of such primitives, procedure integration of the most general form, and complex register allocation. A particularly interesting issue is that of data type analysis. Such analysis would solve certain problems which cannot easily be solved now by RABBIT. For example, consider the piece of code:

(IF (OR A B) X Y) The macro-expansion and optimization phases will reduce this to:

(IF A (IF A X Y) (IF B X Y)) The difficulty is that RABBIT has no way of knowing that A is known to be non-null in the first inner IF by virtue of the testing of A in the outer IF. If it could realize this, then the code would reduce to the more reasonable:

[Page 102]

92 (IF AX (IF BX Y)) Compare this with the case of (IF (AND ...) ...) presented earlier.

One particularly nagging difficulty concerns an interaction between CATCH and optimization by substituting expressions for variables. The problem is that if an expression with a side-effect is substituted into a place which is evaluated after the return of a call to an unknown function (where it had been written at a place normally evaluated before the call), and if a CATCH is performed within that unknown function, and the escape function is subsequently called more than once, then the expression with a side-effect will be evaluated twice instead of once. There is no possible way to decide whether this can happen, other than to be fearful of all unknown function calls. In practice this defeats most optimization. we have ignored this difficulty in RABBIT. It probably indicates that escape functions are even more intractable than we had earlier' believed. It would not be so bad if we could insist that an escape function be called no more than once (or rather, that a CATCH be returned from no more than once, implying that if the escape function is used it must be dynamically within the body of the CATCH). If this restriction is enforced, or if CATCH is forbidden, then in fact no continuation can be invoked more than once, which, with other suitable restrictions, accounts for the ability of most languages to use stacks instead of trees for their control stacks.