Framework divides back-end into 7 components:
- user-provided IRAdaptor
- Analyzer for computing block layout and liveness information
- base Compiler which drives compilation
- base Compiler for each architecture
- Assembler to handle object file generation/mapping
- user-provided instruction-selection
- encoding snippets generated by the framework from user-provided high-level functions
┌───────────┐ ┌───────────────────────────┐ ┌───────────┐
│ │◄─────┼ ├────►│ │
│ IRAdaptor │ │ CompilerBase │ │ Assembler │
│ │ │ │ │ │
└───────────┘ │ ┌────────┐ │ └───────────┘
│ │Analyzer│ │
│ └────────┘ │
└─┬───▲──────────┬──▲────▲──┘
│ │ │ │ │
│ │ │ │ │
┌───▼───┴──────┐ │ │ ┌─┴─────────────────┐
│ Compiler for │◄──┼──┼──┤ Encoding Snippets │
│ <arch> │ │ │ │ (optional) │
└──────┬─▲─────┘ │ │ └─▲─────────────────┘
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
┌─▼─┴─────────▼──┴────┴─────┐
│ │
│ Compiler specific to <IR> │
│ │
└───────────────────────────┘
IRAdaptor:
- user-provided class
- provides functions and definition with which the Compiler can interact with the IR
- this includes a list of functions
- before a function is compiled, adaptor gets a chance to precompute any necessary data structures during compilation
- afterwards adaptor needs to provide an iterator over function arguments, static stack slots, basic blocks, PHIs in each block, instructions in each basic block and their operands and result values
- more detailed description in IRAdaptor reference
Analyzer:
- manages block layout and liveness information for all values
- function analysis done as a first pass before actually compiling the function
- iterates once over the basic blocks provided by the adaptor
- computes block layout mostly according to reverse post-order
- then one pass over all instructions collecting liveness information
- for each value, computes begin and end of its live interval, a refcount and a flag indicating whether the value's assignment can be free'd after its refcount reaches zero or whether the assignment can only be free'd after the end of the live interval
CompilerBase:
- main driver of compilation and provides the entry point into compilation [compile()]
- tracks currently active assignments (register information and stack slot for a value)
- contains architecture-independent logic of the register allocation (PHI handling, spilling decisions)
- when compilation starts, it sets up symbols for each function and gives derived compilers a chance to do initial setup (e.g. generation of data for global values)
- then proceeds in a per-function manner:
- notify Adaptor that a new function is being compiled
- ask analyzer to compute block layout and liveness information
- setup of static stack slots
- ask architecture-specific implementation to generate the function prologue and set-up assignments for arguments
- iterate over each block in the order specified by the analyzer:
- iterate over each instruction in the block as given by the Adaptor and ask the user-provided Compiler implementation to generate the machine code for the instruction
- free any assignments which are no longer live that could not be free'd when their refcount reached zero
CompilerBase for each architecture:
- architecture-specific functionality such as prologue/epilogue generation and argument setup
- otherwise, mostly helpers for commonly needed features like spilling/loading registers to/from the stack, constant materialization, code generation for (un)conditional branches and function call generation
Assembler:
- architecture- and platform-specific; currently only x86-64/AArch64 Linux ELF
- keeps track of symbols and section contents as well as relocations
- also houses unwind and C++ exception table information
- at the end, produces a finished object file or may also be able to map the code directly into memory and do all fixup required to make it executable
user-provided Compiler implementation:
- provides architecture-specific information about value parts
- value may be split into arbitrarily many parts
- each part has a specific size and register bank
- each part must fit into exactly one register
- some information for calling convention when dealing with multi-part values
- can a value be passed in both registers and the stack?
- does a value have to be aligned in registers?
- most importantly, provides a function to generate machine code for each instruction
- interlude: value assignment tracking
- value assignments are accessed through ValueRefs which does refcounting using RAII semantics
- ValueRefs then provide access to individual parts of a value using ValuePart(Refs) which can then be used to load them to registers, get their stack frame offset, etc.
- ValueRefs can also refer to a simple constant, when an assignment is accessed, user implementation gets a chance to simply return a constant ValueRef
- EXAMPLE
- machine code generation complexity depends on instruction complexity
- can be as simple as accessing the assignment of the operands and result, loading the operands into registers, asking for a register for the result and then emitting instructions that implement the semantics of the IR instruction and make sure the result is in the result register given
- TODO: EXAMPLE? may be a bit overwhelming
encoding snippets:
- since writing down the instruction selection for each instruction is difficult, a lot of work and not portable, TPDE includes a tool to generate code that generates a specific sequence of instructions given a high-level function
- example:
int addi32(int a, int b) { return a + b; }
which yields the following machine code
- the tool will now produce a function that will take two ValueParts corresponding to the two input registers, load both to registers, allocate a result register (or reuse one of the input registers if it can be overwritten), generate a
LEA
instruction with these registers and return the result register to the caller.
- conceptually it will look like this:
void encode_add(ValuePartRef&& a, ValuePartRef&& b, ScratchReg& result) {
AsmReg a_reg = a.load_to_reg();
AsmReg b_reg = b.load_to_reg();
AsmReg res_reg;
if (a.can_salvage())
res_reg = result.alloc(a.salvage());
else if (b.can_salvage())
res_reg = result.alloc(b.salvage());
else
res_reg = result.alloc();
ASM(LEA32rm, res_reg, MEM(a_reg, b_reg));
}
- these function can be generated for multiple architectures and then called from architecture-independent code
- they can also take expressions that may be fused into memory operands of instructions
- significantly simplifies implementation of machine code generation
- instead of having to think about salvaging, loading registers, which instructions to use one can defer to the tool and focus on semantics
- e.g. instead of having to implement this for addition and subtraction, the implementation for these instructions can be as simple as
bool compile_binary(IRInstRef inst, BinaryOp op) {
IRValueRef [lhs, rhs] = this->binary_ops(inst);
IRValueRef res = this->inst_res(inst);
ValueRef lhs = this->value_ref(lhs), rhs = this->value_ref(rhs);
ScratchReg result{this};
if (op == BinaryOp::add)
this->encode_add(lhs.part(0), rhs.part(0), result);
else
this->encode_sub(lhs.part(0), rhs.part(0), result);
this->result_ref(res).part(0).set_reg(result);
return true;
}
which is completely architecture-independent without any regards to the actual instructions being generated
- extremely usefule for fast porting to other architectures and when the necessary instruction sequences are long (e.g. float/double to/from int conversions)
- currently no support for branching to other basic blocks in the function or function calls
Code Layout: