Building a Functional Expression Language within C's Type System
Yes, you can nominally type almost everything you may want in C
LM is a functional expression language that is fairly good at generating object code for weird targets such as cross-compilation. To demonstrate this, and to bring LM to a much larger audience than x86-64 Linux, we are developing a C backend. The initial LM release was limited to the x86 binary target and since then has been steadily adding new features.
How do LM types fit into C’s type system?
In the C backend for LM we would like for C types to handle all of the memory alignment and field offset calculations. To do this we need to translate all of the LM Types such as Tuple<U64,String>
into C Types such as struct uuid_3412531
. The translation is fairly straightforward and C can even natively define tagged unions if we are careful.
All LM Types can be thought of as sums of tuples such as
type Option<a> = (Some a) | None;
before translating this into C we need to first understand that C does not support parameterized types. Instead we will create a new type for each concrete instance of our LM Type such as Option<U64>
or Option<Tuple<U64,String>>.
Translating a definition of Option<U64>
into C might look like the following
struct Option_U64 {
long tag;
union {
struct { long _1; };
struct {};
};
}
Here we have a tagged union for the Option type. It has a tag… and a union. I guess that makes sense.
How do LM expressions fit into C’s expression language?
The only remaining hang up for our LM backend is converting all of our LM expressions into equivalent C code. Thankfully C is also an expression language.
C expressions can even have full statements in expression position if you wrap them in a block:
({ while ( condition ) { do_something }; })
This while statement returns void from an expression that consists of a statement inside a block inside a parentheses.
The only small issue that we had during code generation was constructing new instances of data types. For example, how can we return a new Option_U64
from expression position?
The answer is again to use those strange blocks as expressions. The last statement of an expression block will become the value of the expression itself. Using this syntax again we can initialize new instances of any datatype:
({ struct Option_U64 rvalue = { 0, 24 }; rvalue; })
This C expression is equivalent to the LM expression
(Option 24)
Conclusion
The LM backend for C is almost working. Hopefully in the next few days we will see a big release that includes this major development. Then we can refocus LM development back to creating new solutions to harder problems.