LLVM
I recently had the opportunity to get up to speed on LLVM as part of project to replace the interpreter of an in-house functional language.
What is it?
As you’ll read elsewhere on the web, the acronym stands for ‘low level virtual machine’, but it is really a collection of compilation tools.
You, the developer, would take your parser, generated by Coco/R for example, and then use the LLVM libraries to generate byte code for your instruction set. The byte code is LLVM’s IR that can then be run by LLVM’s JIT, or the LLVM framework provides tools that compile the IR to a native executable.
It is used by the latest version of gcc, and is also in the Mono framework.
LLVM is built on top of C++ and uses STL extensively.
Getting started
Firstly, you need to download the latest version of the framework, and build it on your platform via cmake.
Once complete, you will have the entire toolset, including the llvm linker, debugger, static libraries, C++ headers, and examples, plus much more.
How does it work?
Once you have your parser front-end complete, you start to plug in LLVM by including the headers.
Probably the most important class is the IRBuilder. As its name suggests, it builds the intermediate representation. The Kaleidoscope example demonstrates it usage.
Lessons
Whilst coding this, assert becomes your friend. As with all compilers and parsers, you will be walking a abstract syntax tree, and much of the work you do will be on the stack, so liberally asserting the stack size will save you a lot of time.
Also, once you get past adding the basic types, you find the syntax tree gets walked in slightly unexpected ways than you might think, hence making a plan for which part of the parser you are implementing for might not necessarily get followed.
For our language, at least, you need to implement a main function that returns a set type, e.g. int main in C/C++. In the Kaleidoscope, you’ll notice that it adopts float as the only type it supports, and generates a nameless float main. Once the main function is implemented you can start inserting instructions.
Implementing llvm for a language takes the following steps
- Implement basic types and operators
- Implement basic casting
- Implement function prototype and function body code
- Generate an
int main
- Implement blocks and scopes
- Implement conditionals, and loops
- String handling
At this point our project was cancelled, but I expect the following steps would have been taken:
- Implement multi-file ‘module’ linking
- Implement classes and structs
- Add optimization stages
- Move from running code under JIT to full LLVM compilation to native binaries for performance.
At this point I would add ‘goto start’, whereby the experience gained from implementing llvm is reapplied to the whole problem: you might choose to make variables immutable once set; add new language keywords; or more significantly, add implicit threading to the library (e.g. erlang).
At some point I hope to revisit this and complete an implementation of a non-trivial language.