LLVM & Haskell (PART III)

Idriss Chaouch
12 min readSep 17, 2020

Welcome to this third and last part of our discussion about LLVM. We will continue to see the operators and the mutable variables.

User-defined Operators

The “operator overloading” that we will add to Kaleidoscope is more general than languages like C++. In C++, we are only allowed to redefine existing operators: we can’t programmatically change the grammar, introduce new operators, change precedence levels, etc. In this story, we will add this capability to Kaleidoscope, which will let the user round out the set of operators that are supported.

The two specific features we’ll add are programmable unary operators (right now, Kaleidoscope has no unary operators at all) as well as binary operators. An example of this is:

# Logical unary not.

def unary!(v) 
if v then
0 else 1;

# Define > with the same precedence as <.

def binary> 10 (LHS RHS) RHS < LHS;

# Binary “logical or”, (note that it does not “short circuit”)

def binary| 5 (LHS RHS) 
if LHS then
1 else
if RHS then
1
else 0;

# Define = with slightly lower precedence than relationals.

def binary= 9 (LHS RHS) !(LHS < RHS | LHS > RHS);

Many languages aspire to being able to implement their standard runtime library in the language itself. In Kaleidoscope, we can implement significant parts of the language in the library!

We will break down implementation of these features into two parts: implement- ing support for user-defined binary operators and adding unary operators.

Binary Operators

We extend the lexer with two new keywords for “binary” and “unary” toplevel definitions.

lexer :: Tok.TokenParser ()

lexer = Tok.makeTokenParser style

where

. ops = [“+”,”*”,”-”,”/”,”;”,”=”,”,”,”<”,”>”,”|”,”:”]

. names = [“def”,”extern”,”if”,”then”,”else”,”in”,”for”

,”binary”, “unary”]

. style = emptyDef {

. Tok.commentLine = “#”

. , Tok.reservedOpNames = ops

. , Tok.reservedNames = names

. }

Parsec has no default function to parse “any symbolic” string, but it can be added simply by defining an operator new token.

operator :: Parser String operator = do

. c <- Tok.opStart emptyDef

. cs <- many $ Tok.opLetter emptyDef

. return (c:cs)

Using this we can then parse any binary expression. By default all our operators will be left-associative and have equal precedence, except for the bulletins we provide. A more general system would allow the parser to have internal state about the known precedences of operators before parsing. Without predefined precedence values we’ll need to disambiguate expressions with parentheses.

binop = Ex.Infix (BinaryOp <$> op) Ex.AssocLeft

Using the expression parser we can extend our table of operators with the “binop” class of custom operators. Note that this will match any and all operators even at parse-time, even if there is no corresponding definition.

binops = [[binary “*” Ex.AssocLeft,

. binary “/” Ex.AssocLeft]

. ,[binary “+” Ex.AssocLeft,

. binary “-” Ex.AssocLeft]

. ,[binary “<” Ex.AssocLeft]]

expr :: Parser Expr

expr = Ex.buildExpressionParser (binops ++ [[binop]]) factor

The extensions to the AST consist of adding new toplevel declarations for the

operator definitions.

data Expr = …

. | BinaryOp Name Expr Expr

. | UnaryOp Name Expr

. | BinaryDef Name [Name] Expr

. | UnaryDef Name [Name] Expr

The parser extension is straightforward and essentially a function definition with a few slight changes. Note that we capture the string value of the operator as given to us by the parser.

binarydef :: Parser Expr binarydef = do

. reserved “def”

. reserved “binary”

. o <- op

. prec <- int

. args <- parens $ many identifier

. body <- expr

. return $ BinaryDef o args body

To generate code we’ll implement two extensions to our existing code generator. At the toplevel we’ll emit the BinaryDef declarations as simply create a normal function with the name “binary” suffixed with the operator.

codegenTop (S.BinaryDef name args body) =

. codegenTop $ S.Function (“binary” ++ name) args body

Now for our binary operator, instead of failing with the presence of a binary operator not declared in our binops list, we instead create a call to a named “binary” function with the operator name.

cgen (S.BinaryOp op a b) = do case Map.lookup op binops of

Justf ->do ca <- cgen a cb <- cgen b f ca cb

Nothing -> cgen (S.Call (“binary” ++ op) [a,b])

Unary Operators

For unary operators we implement the same strategy as binary operators. We add a parser for unary operators simply as a Prefix operator matching any symbol.

unop = Ex.Prefix (UnaryOp <$> op)

We add this to the expression parser like above.

expr :: Parser Expr

expr =. Ex.buildExpressionParser (binops ++ [[unop], [binop]]) factor

The parser extension for the toplevel unary definition is precisely the same as function syntax except prefixed with the “unary” keyword.

unarydef :: Parser Expr unarydef = do

. reserved “def”

. reserved “unary”

. o <- op

. args <- parens $ many identifier

. body <- expr

. return $ UnaryDef o args body

For toplevel declarations we’ll simply emit a function with the convention that the name is prefixed with the word “unary”. For example (“unary!”, “unary-”).

codegenTop (S.UnaryDef name args body) =

. codegenTop $ S.Function (“unary” ++ name) args body

Up until now we have not have had any unary operators so for code generation we will simply always search for an implementation as a function.

cgen (S.UnaryOp op a) = do

cgen $ S.Call (“unary” ++ op) [a]

That’s it for unary operators, quite easy indeed!

Kicking the Tires

It is somewhat hard to believe, but with a few simple extensions we’ve covered in the last chapters, we have grown a real-ish language. With this, we can do a lot of interesting things, including I/O, math, and a bunch of other things. For example, we can now add a nice sequencing operator (printd is defined to print out the specified value and a newline):

ready> extern printd(x);

declare double @printd(double)

ready> def binary : 1 (x y) 0;

..

ready> printd(123) : printd(456) : printd(789);

123.000000

456.000000

789.000000

Evaluated to 0.000000

We can also define a bunch of other “primitive” operations, such as:

# Logical unary not.

def unary!(v) 
if v then
0 else 1;

# Unary negate.

def unary-(v) 0-v;

# Define > with the same precedence as <.

def binary> 10 (LHS RHS) RHS < LHS;

# Binary logical or, which does not short circuit.

def binary| 5 (LHS RHS) 
if LHS then
1 else
if RHS then
1
else 0;

# Binary logical and, which does not short circuit.

def binary& 6 (LHS RHS) 
if !LHS then
0 else !!RHS;

# Define = with slightly lower precedence than relationals.

def binary = 9 (LHS RHS) !(LHS < RHS | LHS > RHS);

# Define ‘:’ for sequencing: as a low-precedence operator that ignores operands # and just returns the RHS.

def binary : 1 (x y) y;

Mutable Variables

While Kaleidoscope is interesting as a functional language, the fact that it is functional makes it “too easy” to generate LLVM IR for it. In particular, a functional language makes it very easy to build LLVM IR directly in SSA form. Since LLVM requires that the input code be in SSA form, this is a very nice property and it is often unclear to newcomers how to generate code for an imperative language with mutable variables.

The short (and happy) summary of this chapter is that there is no need for our front-end to build SSA form: LLVM provides highly tuned and well tested support for this, though the way it works is a bit unexpected for some.

Why is this a hard problem?

To understand why mutable variables cause complexities in SSA construction, consider this extremely simple C example:

int G, H;int test(_Bool Condition) {int X;if (Condition)   X = G;else   X = H; return X;}

In this case, we have the variable “X”, whose value depends on the path executed in the program. Because there are two different possible values for X before the return instruction, a Phi node is inserted to merge the two values. The LLVM IR that we want for this example looks like this:

@G = weak global i32 0. ; type of @G is i32*

@H = weak global i32 0. ; type of @H is i32*

define i32 @test(i1 %Condition) {

entry:

. br i1 %Condition, label %cond_true, label %cond_false

cond_true:

%X.0 = load i32* @G br label %cond_next

cond_false:

%X.1 = load i32* @H br label %cond_next

cond_next:

. %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]

. ret i32 %X.2

}

The control flow graph for the above IR:

In this example, the loads from the G and H global variables are explicit in the LLVM IR, and they live in the then/else branches of the if statement (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node in the cond_next block selects the right value to use based on where control flow is coming from: if control flow comes from the cond_false block, X.2 gets the value of X.1. Alternatively, if control flow comes from cond_true, it gets the value of X.0. The intent of this chapter is not to explain the details of SSA form.

For more information, see one of the many online references.

The question for this article is “who places the phi nodes when lowering assign- ments to mutable variables?”. The issue here is that LLVM requires that its IR be in SSA form: there is no “non-SSA” mode for it. However, SSA construction requires non-trivial algorithms and data structures, so it is inconvenient and wasteful for every front-end to have to reproduce this logic.

Memory in LLVM

The ‘trick’ here is that while LLVM does require all register values to be in SSA form, it does not require (or permit) memory objects to be in SSA form. In the example above, note that the loads from G and H are direct accesses to G and H: they are not renamed or versioned. This differs from some other compiler systems, which do try to version memory objects. In LLVM, instead of encoding dataflow analysis of memory into the LLVM IR, it is handled with Analysis Passes which are computed on demand.

With this in mind, the high-level idea is that we want to make a stack variable (which lives in memory, because it is on the stack) for each mutable object in a function. To take advantage of this trick, we need to talk about how LLVM

represents stack variables.

In LLVM, all memory accesses are explicit with load/store instructions, and it is carefully designed not to have (or need) an “address-of” operator. Notice how the type of the @G/@H global variables is actually i32* even though the variable is defined as i32. What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the address for that space. Stack variables work the same way, except that instead of being declared with global variable definitions, they are declared with the LLVM alloca instruction:

define i32 @example() {

entry:

This code shows an example of how we can declare and manipulate a stack variable in the LLVM IR. Stack memory allocated with the alloca instruction is fully general: we can pass the address of the stack slot to functions, we can store it in other variables, etc. In our example above, we could rewrite the example to use the alloca technique to avoid using a Phi node:

@G = weak global i32 0. ; type of @G is i32*

@H = weak global i32 0. ; type of @H is i32*

%X = alloca i32

%tmp = load i32* %X

%tmp2 = add i32 %tmp, 1

store i32 %tmp2, i32* %X ; store it back …

define i32 @test(i1 %Condition) {entry:%X = alloca i32br i1 %Condition, label %cond_true, label %cond_falsecond_true:%X.0 = load i32* @G store i32 %X.0, i32* %X br label %cond_nextcond_false:%X.1 = load i32* @H store i32 %X.1, i32* %X br label %cond_nextcond_next:%X.2 = load i32* %X ret i32 %X.2}

While this solution has solved our immediate problem, it introduced another one: we have now apparently introduced a lot of stack traffic for very simple and common operations, a major performance problem. Fortunately for us, the LLVM optimizer has a highly-tuned optimization pass named “mem2reg” that handles this case, promoting allocas like this into SSA registers, inserting Phi nodes as appropriate. If we run this example through the pass, for example, we’ll get:

$ llvm-as < example.ll | opt -mem2reg | llvm-dis @G = weak global i32 0

@H = weak global i32 0

define i32 @test(i1 %Condition) { 
entry:
br i1 %Condition, label %cond_true, label %cond_false cond_true:%X.0 = load i32* @G br label %cond_nextcond_false:%X.1 = load i32* @H br label %cond_nextcond_next:%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] ret i32 %X.01}

We say a block “A” dominates a different block “B” in the control flow graph if it’s impossible to reach “B” without passing through “A”, equivalently “A” is the dominator of “B”. The mem2reg pass implements the standard “iterated dominance frontier” algorithm for constructing SSA form and has a number of optimizations that speed up (very common) degenerate cases.

The mem2reg optimization pass is the answer to dealing with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances: mem2reg is alloca-driven: it looks for allocas and if it can handle them, it promotes them. It does not apply to global variables or heap allocations.

• mem2reg only looks for alloca instructions in the entry block of the function. Being in the entry block guarantees that the alloca is only executed once, which makes analysis simpler.

• mem2reg only promotes allocas whose uses are direct loads and stores. If the address of the stack object is passed to a function, or if any funny pointer arithmetic is involved, the alloca will not be promoted.

• mem2reg only works on allocas of first class values (such as pointers, scalars and vectors), and only if the array size of the allocation is 1 (or missing in the .ll file).

• mem2reg is not capable of promoting structs or arrays to registers. Note that the “scalarrepl” pass is more powerful and can promote structs, “unions”, and arrays in many cases.

Conclusion

Our little language supports a couple of interesting features: it supports user defined binary and unary operators, it uses JIT compilation for immediate evaluation, and it supports a few control flow constructs with SSA construction.

Part of the idea of this tutorial was to show how easy and fun it can be to define, build, and play with languages. Building a compiler need not be a scary or mystical process! Now that we’ve seen some of the basics, I strongly encourage you to take the code and hack on it. For example, try adding:

  • global variables – While global variables have questionable value in modern software engineering, they are often useful when putting together quick little hacks like the Kaleidoscope compiler itself. Fortunately, our current setup makes it very easy to add global variables: just have value lookup check to see if an unresolved variable is in the global variable symbol table before rejecting it.

• typed variables – Kaleidoscope currently only supports variables of type double. This gives the language a very nice elegance, because only sup- porting one type means that we never have to specify types. Different languages have different ways of handling this. The easiest way is to require the user to specify types for every variable definition, and record the type of the variable in the symbol table along with its Value*.

• arrays, structs, vectors, etc – Once we add types, we can start extending the type system in all sorts of interesting ways. Simple arrays are very easy and are quite useful for many different applications. Adding them is mostly an exercise in learning how the LLVM getelementptr instruction works: it is so nifty/unconventional, it has its own FAQ! If we add support for recursive types (e.g. linked lists), make sure to read the section in the LLVM Programmer’s Manual that describes how to construct them.

• standard runtime – Our current language allows the user to access arbitrary external functions, and we use it for things like “printd” and “putchard”. As we extend the language to add higher-level constructs, often these constructs make the most sense if they are lowered to calls into a language-supplied runtime. For example, if we add hash tables to the language, it would probably make sense to add the routines to a runtime, instead of inlining them all the way.

• memory management – Currently we can only access the stack in Kaleidoscope. It would also be useful to be able to allocate heap memory, either with calls to the standard libc malloc/free interface or with a garbage collector such as the Boehm GC. If we would like to use garbage collection, note that LLVM fully supports Accurate Garbage Collection including algorithms that move objects and need to scan/update the stack.

• debugger support – LLVM supports generation of DWARF Debug info which is understood by common debuggers like GDB. Adding support for debug info is fairly straightforward. The best way to understand it is to compile some C/C++ code with “clang -g -O0” and taking a look at what it produces.

• exception handling support – LLVM supports generation of zero cost exceptions which interoperate with code compiled in other languages. You could also generate code by implicitly making every function return an error value and checking it. You could also make explicit use of setjmp/longjmp.

Hope you learned new thing, keep safe k see you soon.

--

--

Idriss Chaouch

C++ | STL | Semiconductors | VHDL | SoCs | FPGAs | Python | Visual Studio