Understanding LLVM IR with TCO
Contents:
# The first program
Let’s write some basic C code to see how its LLVM IR looks like. LLVM IR is SSA (Single Static Assignment) type sorta low level language that all frontends using LLVM compile to (example clang). I will be using clang to generate the IR and we’ll go through what it exactly means. The manual for the IR will be referenced in the discussion.
Let’s start with the basic hello world in C (we’ll slowly get more complex, and finish with a tail recursive fibonacci function)
int
Now we need to convert this to the LLVM IR. Reading the clang man page we get this:
-flto, -flto=full, -flto=thin, -emit-llvm
Generate output files in LLVM formats, suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this
generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).
So, we just need to do:
And we have this piece of beauty:
; ModuleID = 'hello.c'
source_filename = "hello.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
@.str = private unnamed_addr constant c"helloWorld!\00", align 1
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @ #0
declare i32 @ #1
attributes #0 =
attributes #1 =
!llvm.module.flags = !
!llvm.ident = !
!0 = !
!1 = !
!2 = !
!3 = !
!4 = !
!5 = !
# Analysis
Okay, where do we start? Let’s see if there are any reference materials for the LLVM IR. Ummm. good news and bad news. Good news is I found the reference manual. bad news is that’s the only think I found. See for youself here: Manual.
Few bits of information:
- Comments are started off with a semi-colon, that’s good to know
- Variables come in two types, global (preceeded by an
@) and local (preceeded by a%). Usually we’ll see%a lot because we tend of define stuff inside functions locally and all. - String definitions are simple, each character is read literally except a backslash. They start and end with
". So, in my code,
@.str = private unnamed_addr constant c"helloWorld!\00", align 1
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @ #0
This is the more important part. Here a global variable named
.stris assigned and its “helloWorld!\00” The final \00 is the null byte which if you will recall is what C uses to define end of string and all.The
privatelinkage type basically means: it is only directly accessible by objects in the current module.Global variables can be marked with
unnamed_addrwhich indicates that the address is not significant, only the content.[12 x i8]-> This simply means array of 12 “8 bit” integer values. Why 12? Because our string length is 12 (including null byte). Each letter is represented by an 8 bit integer. That’s beautiful.align
or align( ): This indicates that the pointer value or vector of pointers has the specified alignment. This probably makes sense to compiler folks by not me, so I am going to take lite on this for now. I am assuming its about word (as in the computing definition ) alignment but then what does align 1oralign nexactly mean?Then we have a snarky little comment and then the main function definition:
define dso_local i32 @ #0dso_local: The compiler may assume that a function or variable marked as dso_local will resolve to a symbol within the same linkage unit. Direct access will be generated even if the definition is not within this compilation unit.
In our case, the
dso_localis binding for the main function.%1and%2are the language’s way of defining every unnamed variabe. It cannot let anything hanging and it starts sequentually. these are all local values. This is the whole idea behind Static Single Assignment (SSA) for which LLVM owns a part of its populatrity.- This line:
store i32 0, i32* %1, align 4Looks like its trying to store the return value 0, by giving it a pointer which we just defined before
%1. Actually no, this is an artifact of the unoptimized representation. This is equivalent to storing a random 0 for no reason because in the final return statement, it never referenes%1! Not sure why it does this.Its kind of like
clang'sfault. So, hahaha.Then we call the function
printf(which has a global definition), on our global.strvariable@.strand returns 0.Obviously a bunch of other things were passed to
printf(), but that’s doing pointer arithmetic and we’ll ignore that for now.
What about a bunch of other stuff that the code has:
declare i32 @ #1 attributes #0 = attributes #1 = !llvm.module.flags = ! !llvm.ident = ! !0 = ! !1 = ! !2 = ! !3 = ! !4 = ! !5 = !What’s all this about?!
- The
!is the metadata prefix. We have named and unnamed metadata (which get sequenced from 0). So all the lower 6 lines above, are unnamed metadata for this artifact.
Information about the module as a whole is difficult to convey to LLVM’s subsystems. The LLVM IR isn’t sufficient to transmit this information. The
llvm.module.flagsnamed metadata exists in order to facilitate this. These flags are in the form of key / value pairs — much like a dictionary — making it easy for any subsystem that cares about a flag to look it up.Note how it makes a reference to the unnamed metadata. That’s just bookkeeping.
The
attributestag is from the “attribute groups”: Attribute groups are groups of attributes that are referenced by objects within the IR. They are important for keeping .ll files readable, because a lot of functions will use the same set of attributes. In the degenerate case of a .ll file that corresponds to a single .c file, the single attribute group will capture the important command line flags used to build that file.There are target dependent (
#0) and target independent (#1). So this linedeclare i32 @printf(i8* noundef, ...) #1then means thatprintfhas the attributes of attribute group#1.- And
mainif you will recall then has the attributes of#0, which are target dependent.
# The second program
Now that we understand this a little bit, lets look at more relatively complex codes and see what changes and how.
intI just remembered how one of the optimizations that LLVM would do, was to replace printf with puts when we’re not passing in dynamic values.
so looking at this IR:
; ModuleID = 'hello2.c' source_filename = "hello2.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" @.str = private unnamed_addr constant c"sum=%d\00", align 1 ; Function Attrs: noinline nounwind optnone uwtable define dso_local i32 @ #0 declare i32 @ #1 attributes #0 = attributes #1 = !llvm.module.flags = ! !llvm.ident = ! !0 = ! !1 = ! !2 = ! !3 = ! !4 = ! !5 = !# Analysis
We’ve understood most of the stuff, the important things other than bookkeeping are just these two things:
@.str = private unnamed_addr constant c"sum=%d\00", align 1 ; Function Attrs: noinline nounwind optnone uwtable define dso_local i32 @ #0This time the number of bytes in the
.strglobal variable is 7, it is litearlly counting the format specifier! This is ofcourse the non-optimized case. We’ll see what happens in the optimized case soon.So it assigned three placeholders in memory with local unnamed variables, then it stores 0 again to one of them (as it did last time!), then stores the two integers we defined to be added together.
Then it loads the two defined values into two new locations and calls
addon them. The nsw keyword stands forno signed wrapwhich if specified means that the result value of the add is a poison value if signed overflow occurs.
Basically its saying that this addition is not protected from signed overflows.
- Then it does the same things as before and prints it out. You can see printf calling
%6this time! Compare that with the older vanilla printf:
%2 = call @No arguments at all!
# The third program (with optimizations enabled)
Let’s try to see the optimizations now, what happens if I write a recursive function and let it perform a Tail call optimisation?
int int int intSo this is the program I have, slightly more complex but nothing over the top. Thanks to OCaml for making my understanding of creating tail-recursive functions clear.
So now this should TCO the tailfib function but not the notailfib. If you’re struck with what TCO means then checkout my blog on that. Anyway, it simply means that instead of “calling the function” (as in a
calloperand in assembly) which takes up more overhead in the stack, we perform a “jump” because after all the function is already defined and when we write a recursive loop, a JUMP is what we MEAN.Jumps are faster than calls, so that’s the optimization. Let’s see what clang does (note that we’ll have to compile this with optimizations enabled as it won’t do it by default)
; ModuleID = 'fib.c' source_filename = "fib.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" @.str = private unnamed_addr constant c"This is the %dth fib number (notail): %d\0A\00", align 1 @.str.1 = private unnamed_addr constant c"This is the %dth fib number (tail): %d\0A\00", align 1 ; Function Attrs: nofree nosync nounwind readnone uwtable define dso_local i32 @ local_unnamed_addr #0 ; Function Attrs: nofree nosync nounwind readnone uwtable define dso_local i32 @ local_unnamed_addr #0 ; Function Attrs: nofree norecurse nosync nounwind readnone uwtable define dso_local i32 @ local_unnamed_addr #1 ; Function Attrs: nofree nounwind uwtable define dso_local i32 @ local_unnamed_addr #2 ; Function Attrs: nofree nounwind declare noundef i32 @ local_unnamed_addr #3 attributes #0 = attributes #1 = attributes #2 = attributes #3 = !llvm.module.flags = ! !llvm.ident = ! !0 = ! !1 = ! !2 = ! !3 = ! !4 = !# Analysis
Let’s first look at the
notailfib:define dso_local i32 @ local_unnamed_addr #0- The old syntax of a global variable definition (
@) is there, along with thedso_localdeclaration. - This time the function takes in a parameter which is a local unname variable (
%0). - For the
brinstruction you can choose to read the documentation or my simple explanation: The way it is being used here, its an unconditional branch to a label identified as2.
So the program will start at the label 2, let’s see what it does from there on:
2: ; preds = %6, %1 %3 = phi i32 , %4 = phi i32 , %5 = icmp ult i32 %4, 2 br i1 %5, label %11, label %6For the
phiinstruction, you can again choose to read the docs, or simply understand the semantics that at runtime, thephiinstruction logically takes on the value specified by the pair corresponding to the predecessor basic block that executed just prior to the current block.In other words, it allows for you to dynamically update a variable AND reference that in the next turn. Usually used in case of loops or recursive functions!
This is an example from the docs:
Loop: ; Infinite loop that counts from 0 on up... %indvar = phi i32 , %nextindvar = add i32 %indvar, 1 br label %LoopNote that something interesting has happened here. If you look at branch
%6, there is a “tail call” there, this is because LLVM managed to perform TCO for one of the two recursive calls!%3 = phi i32 ,So, this is telling us that if coming from
%1, deifined inside main like this:%1 = tail call i32 @then
%3will take a 0, otherwise if coming from%6,%3will take the value of%10. We’ll worry about%10in just a minute. So, similarly we have%4also defined, and finally%5compares%4to 2 and decides to branch based on the value of%5, either to%11or%6.What this means is, if
%4is less than 2, we branch to%11, which returns a sum of%4and%3. This is the base case btw, which will become clearer as we explore what%3actually is. This implies that%4is actually the number we entern.Note that this is not very different from reading assembly, in fact, this can be a very good material to learn assembly! Anyways, let’s say we head to %6 from the above branching, what happens then?
6: ; preds = %2 %7 = add nsw i32 %4, -1 %8 = tail call i32 @ %9 = add nsw i32 %4, -2 %10 = add nsw i32 %8, %3 br label %2First thing to notice is that this branches unconditionally to
%2. We decrement the%4value (remember that this will be populated from the%2block), and this should remind you aboutnotailfib (n-1)because the next line does exactly that!Notice how LLVM is flagging this as a tail-call? As I said earlier, it looks like the compiler figured out a way to make 1 recursive call into a loop.
Then we do
n-2and store that in%9and we take the value we got fromnotailfib (n-1)and add that to%3. And%3if you recall is either 0 or%10(which is this value we’re currently setting) depending on where it comes from.Since right now it comes from
%6, the value of%3will be%10. Now we go back to the %2 block and again check our base case while evaluating on(n-2)instead ofn(why? Because%4will pick up the value of%9which wasn-2).
So essentially the compiler figured out that we can loop for
n-2and call the function forn-1which is better than calling for both!Okay now equipped with this you can probably trace out the entire flow for this function given the input 15. Its a fun exercise! I will move on to the tail recursive function now:
define dso_local i32 @ local_unnamed_addr #0Lets first look at
recand then we’ll talk about whattailfibdoes. Let’s not go over the full flow of the program again, but talk about only the differences:This is using a
switchto handle the 0 and 1 case. Is this more efficient than a simpleicmp ult? Anicmpis essentially an if-else equivalent and a switch is faster IF the number of entries in the jump table are large enough to justify it. In this case, the choice is arbitrary in my opinion and won’t really optimize or fuck the code.There are no function calls!
This is a complete loop with no recursive function calls at all. This is the exactly what TCO promises us!
And if we checkout the
tailfibfunction we’ll see that its pretty much the same thing. Why? Because the compiler is smart! It figured out that we’re callingrec()inside the failfib function and doing nothing else, so it substituted the entire code forrecinside it with the right params so there is no function call overhead!Okay, so with that, we have come to an end of our discussion on LLVM IR. It has been fun and entertaining and I got to learn something new.
Did you like this blogpost? Then consider catching up via LinkedIn or Github!