How to Bootstrap a Compiler for a Functional Programming Language
Chapter 2: Primitive Functions
What Is a Function?
Functions at the machine level are a bit tricky to capture as a single concept. The reason for this is because different functions may have different calling conventions. The most unifying concept of functions may be the concept of the call
instruction. This instruction takes a code position (relative "near" or absolute "far") and jumps to it. The call
instruction is meant to work in tandem with ret
to return to the callers context.
The remaining concepts of functions can be expressed as preconditions and postconditions.
preconditions are what is expected to happen before the
call
instruction.postconditions are what is expected to be true after the
ret
instruction.
What Do We Expect From a Function Inside the Compiler?
Our compiler has an exceedingly simple calling convention for functions that it uses internally.
functions can always be referenced as "near" positions
The %r8, %r9, %r10, %r11, %r12, %r13, %r14 registers can be used for arguments and return values
%r12, %r13 is the primary argument
%r12, %r13 is the primary return value
System Calls
The exception to our internal calling convention is system calls. We will target this compiler to run on Linux x86-64 systems. We require the operating system functions sys_open
, sys_close
, sys_write
, sys_read
, and sys_exit
. Our system calls can be made with a helper function like this.
system-call := λrax rdi rsi rdx. (
(mov rax %rax) # system call number
(mov rdi %rdi)
(mov rsi %rsi)
(mov rdx %rdx)
syscall # invoke the operating system
);
The print-s
function
Let's define a function to see how this looks. Here is the print-s
function that dumps an S-Expression to stdout.
print_s:
# if .head is zero, then this is Nil
(cmp $0 %r12)
(je print_s_nil)
# if only .tail is zero, then this is an Atom
(cmp $0 %r13)
(je print_s_atom)
# if .head and .tail are non-zero, then this is a Cons
(system-call $1 $1 $left_paren $1)
(push-this())
(call head)
(call print_s)
(pop-this())
(system-call $1 $1 $space $1)
(push-this())
(call tail)
(call print_s)
(pop-this())
(system-call $1 $1 $right_paren $1)
ret
print_s_nil:
# nil is two bytes "()" located in the data section at $nil_literal
(system-call $1 $1 $nil_literal $2)
ret
print_s_atom:
(call strlen) # %r8 is string length of this atom
(system-call $1 $1 %r12 %r8)
ret
The strlen
function
The strlen
function measures the length of the atom in %r12
and leaves the result in %r8
.
strlen:
(xor %r8 %r8)
(mov %r12 %rax)
strlen_loop:
(cmpb $0 0\[%rax\]
(jz strlen_exit)
(inc %r8)
(inc %rax)
(jmp strlen_loop)
strlen_exit:
ret
The not
function
The not
function returns True
if this is nil, otherwise it returns nil.
not:
(cmp $0 %r12)
(jne not_yield_nil)
(mov $true %r12)
(mov $0 %r13)
ret
not_yield_nil:
(mov $0 %r12)
(mov $0 %r13)
ret
The write-file
function
The write-file
function writes data from an atom to a file.
write_file:
# %r12 is an atom of the file name to write to
# %r13 is an atom of the data to write to file
#open file
#577 = O_WRONLY | O_CREAT | O_TRUNC
(system-call $2 0\[%r12\] $577 $420)
(mov %rax %r9) # %r9 now holds the file descriptor
#write to file
#r13 is data to write to file
(mov 0\[%r13\] %r12) # %r12 is data pointer
(call strlen) # %r8 now holds length of data
(system-call $1 %r9 %r12 %r8) # (sys_write file_descriptor data_pointer data_length)
#close file
(system-call $3 %r9 $0 $0)
ret
The load-file
function
The load-file
function read data from a file.
load_file:
# this is an atom of the file name to read from
# open file in read-only mode
(system-call $2 %r12 $0 $0)
(mov %rax %r9) # %r9 now holds the file descriptor
(cmp $0 %r9)
(jge load_file_contents)
(mov $err_fopen %r12)
(mov $0 %r13)
ret
load_file_contents:
(allocate-atom-data())
#calculates where to place next atom data in atom pool
#the atom pool is a string pool where new atom data can be written
#r12 holds pointer to head of new data (the pointer to return as an atom)
#r13 holds pointer to tail of new data (currently the same as head pointer)
(mov $0 %r14)
#r14 holds length of raw uncopied data currently in the load_file_buffer
# move data from buffer into string
load_file_loop:
(cmp $0 %r14)
(je load_file_bufempty)
(mov $load_file_buffer %r8)
(movb 0\[%r8\] %bl)
(mov %bl 0\[%r13\])
(inc %r8)
(inc %r13)
(dec %r14)
(jmp load_file_loop)
# read file
load_file_bufempty:
(system-call $0 %r9 $load_file_buf $1024)
(mov %rax %r14)
(cmp $0 %r14)
(jne load_file_loop)
# close file
(system-call $3 %r9 $0 $0)
# finish atom with a null character
(movb $0 0\[%r13\])
(inc %r13)
#update atom pool to protect our new data
(progress-atom-data())
#return our new atom
(mov $0 %r13)
ret
This is an Excerpt from The Bootstrap Book
The Bootstrap Book is released under the terms of the permissive MIT license. If you quote or copy the book, please make a reference. That is all I ask, please.