Code Generation is the step where parsed expressions and programs are turned into code blocks and final output.
The Expression Structure
We will define a simple data type to represent our output code. The datatype defines several fields: program output, text output, data output, and program context. There are also fields for stack-frame construction and destruction.
type Expr (frame program unframe text data ctx);
The compile-expr
Function
The compile-expr
function is a fairly simple case-by-case enumeration of how to compile each type of expression. The one trick that is applied on this function is the "utilization" parameter. A utilization is either Used, Unused, or Tail. This parameter exists for an optimization called "Unused Value Elimination" where values that do not get used will never be constructed. In practice this optimization can reduce memory pressure significantly.
compile-expr := λctx e used. (
(local is_tail_safe)
(local return)
(match e (
... enumerate expression codegen cases ...
))
(match (is_tail_safe used) (
( (() Tail)
(set return (set.program return ((.program return) (call tail))))
)
))
return
);
Codegen this
Expressions
The $_
or "this" expression refers to the current expression. By not changing the current expression, effectively we yield the current expression. Therefore no operation needs to happen, this expression can be empty.
( (Var $_) (Expr () () () ctx) )
Codegen local
Expressions
The meaning of the local variable declaration is implemented mostly by our push-locals
helper function. All that needs to be done here is add this local to the context. The declare-local
helper will add a context binding for (Get n)
and (Set n)
to tell us where the variable exists in the stack frame.
( (App (Var 'local) (Var n))
(set is_tail_safe True)
(set ctx (declare-local ctx n))
(Expr () () () ctx)
)
Codegen set
Expressions
The set
expression assigns a new value to a variable. There are two cases for stack locals vs heap globals.
( (App (App (Var 'set) (Var v)) e )
match (lookup ctx (Set v)) (
(assign
(compile-expr ctx e Used)
assign
)
(err fail err)
)
)
Codegen match
Expressions
Match expressions are probably the most complicated part of the entire compiler. A match expression is somewhat like the opposite of an S-Expression. Pattern matching destructures expressions into one of several potential cases.
( (App( (App( (Variable 'match) t )) p )) (
(local label_skip)
(set label_skip (uuid()))
(compile-expr t Used)
(yield-patterns p)
(cmp $0 %rsi)
(jne label_skip)
(yield-nil ())
(label_skip ':)
))
The yield-patterns
function steps through each case of the pattern that could potentially match. The yield-patterns
function then calls the destructure-pattern-lhs
function which does the destructuring. There are only 5 cases of S-Expression destructuring: wildcard match, variable binding, Nil, Literal, and Cons.
Codegen while
Expressions
While expressions repeatedly evaluate an expression with a body expression if the condition is true.
( (App( (App( (Variable 'while) c )) d )) (
(local label_while_start)
(set label_while_start (uuid()))
(local label_while_end)
(set label_while_end (uuid()))
(label_while_start ':))
(compile-expr c Used)
(cmp $0 %r12)
(je label_while_end)
(compile-expr d Unused)
(jmp label_while_start)
(label_while_end ':)
))
Codegen if
Expressions
If expressions return one expression or another depending on a condition.
( (App( (App( (App( (Variable 'if) c )) t )) f )) (
(local label_if_true)
(set label_if_true (uuid()))
(local label_if_end)
(set label_if_end (uuid()))
(compile-expr c Used)
(cmp $0 %r12)
(jne label_if_true)
(cmp $0 %r13)
(jne label_if_true)
(compile-expr f Used)
(jmp label_if_end)
(label_if_true ':)
(compile-expr t Used)
(label_if_end ':)
))
Codegen Function Application Expressions
Function application can potentially look like a cons-cell. To tell the difference we need to look in the symbol table to check if a left-variable is a function.
( (App ((Variable fname) arg)) (
(match (get-maybe-function(ctx fname)) (
( (Function mangledname) (
(compile-expr arg Used)
(call mangledname)
))
( () (
(fail (ReferenceToUndefinedVariable fname))
))
( v (
(yield-cons v arg Used)
))
))
))
Codegen Literal Expressions
Literals need to put their data into the .data section of the program. When instantiated that data reference will be put into the %r12 register to create an Atom.
( (Literal l) (
(yield-atom l)
))
Codegen Cons Expressions
Raw Cons cells are fairly simple to create. The left and right expressions are compiled individually before being put into the %r12 and %r13 registers respectively. The used
argument here helps determine whether the cons cell is actually needed. If the cons cell will not be used, then it doesn't need to be created in the first place.
( (App (l r)) (
(yield-cons l r used)
))
Codegen Nil Expressions
The Nil expression is created by clearing the %r12 and %r13 registers.
( Nil (
(mov $0 %r12)
(mov $0 %r13)
))
Codegen Variable Expressions
Variable expressions look simple as long as we are diligent about maintaining our symbol table.
( (Variable vname) (
(get-local ctx vname)
))
Conclusion
That is all for the compiler. It should work now if everything is put together correctly!
An updated version of this article series is available on Github. The source code for the compiler is also available.
This is an Excerpt from The Bootstrap Book
The Bootstrap Book is released under the terms of the permissive MIT license. If you quote or copy the book, please make a reference. That is all I ask, please.