Framework divides back-end into 7 components:
- user-provided IRAdaptor
- Analyzer for computing block layout and liveness information
- base Compiler which drives compilation
- base Compiler for each architecture
- Assembler to handle object file generation/mapping
- user-provided instruction-selection
- encoding snippets generated by the framework from user-provided high-level functions
┌───────────┐      ┌───────────────────────────┐     ┌───────────┐
│           │◄─────┼                           ├────►│           │
│ IRAdaptor │      │     CompilerBase          │     │ Assembler │
│           │      │                           │     │           │
└───────────┘      │              ┌────────┐   │     └───────────┘
                   │              │Analyzer│   │                  
                   │              └────────┘   │                  
                   └─┬───▲──────────┬──▲────▲──┘                  
                     │   │          │  │    │                     
                     │   │          │  │    │                     
                 ┌───▼───┴──────┐   │  │  ┌─┴─────────────────┐   
                 │ Compiler for │◄──┼──┼──┤ Encoding Snippets │   
                 │    <arch>    │   │  │  │     (optional)    │   
                 └──────┬─▲─────┘   │  │  └─▲─────────────────┘   
                        │ │         │  │    │                     
                        │ │         │  │    │                     
                        │ │         │  │    │                     
                      ┌─▼─┴─────────▼──┴────┴─────┐               
                      │                           │               
                      │ Compiler specific to <IR> │               
                      │                           │               
                      └───────────────────────────┘                  
IRAdaptor:
- user-provided class
- provides functions and definition with which the Compiler can interact with the IR
- this includes a list of functions
- before a function is compiled, adaptor gets a chance to precompute any necessary data structures during compilation
- afterwards adaptor needs to provide an iterator over function arguments, static stack slots, basic blocks, PHIs in each block, instructions in each basic block and their operands and result values
- more detailed description in IRAdaptor reference
Analyzer:
- manages block layout and liveness information for all values
- function analysis done as a first pass before actually compiling the function
- iterates once over the basic blocks provided by the adaptor
- computes block layout mostly according to reverse post-order
- then one pass over all instructions collecting liveness information
- for each value, computes begin and end of its live interval, a refcount and a flag indicating whether the value's assignment can be free'd after its refcount reaches zero or whether the assignment can only be free'd after the end of the live interval
CompilerBase:
- main driver of compilation and provides the entry point into compilation [compile()]
- tracks currently active assignments (register information and stack slot for a value)
- contains architecture-independent logic of the register allocation (PHI handling, spilling decisions)
- when compilation starts, it sets up symbols for each function and gives derived compilers a chance to do initial setup (e.g. generation of data for global values)
- then proceeds in a per-function manner:
- notify Adaptor that a new function is being compiled
- ask analyzer to compute block layout and liveness information
- setup of static stack slots
- ask architecture-specific implementation to generate the function prologue and set-up assignments for arguments
- iterate over each block in the order specified by the analyzer:
- iterate over each instruction in the block as given by the Adaptor and ask the user-provided Compiler implementation to generate the machine code for the instruction
- free any assignments which are no longer live that could not be free'd when their refcount reached zero
 
 
CompilerBase for each architecture:
- architecture-specific functionality such as prologue/epilogue generation and argument setup
- otherwise, mostly helpers for commonly needed features like spilling/loading registers to/from the stack, constant materialization, code generation for (un)conditional branches and function call generation
Assembler:
- architecture- and platform-specific; currently only x86-64/AArch64 Linux ELF
- keeps track of symbols and section contents as well as relocations
- also houses unwind and C++ exception table information
- at the end, produces a finished object file or may also be able to map the code directly into memory and do all fixup required to make it executable
user-provided Compiler implementation:
- provides architecture-specific information about value parts
- value may be split into arbitrarily many parts
- each part has a specific size and register bank
- each part must fit into exactly one register
 
- some information for calling convention when dealing with multi-part values
- can a value be passed in both registers and the stack?
- does a value have to be aligned in registers?
 
- most importantly, provides a function to generate machine code for each instruction
- interlude: value assignment tracking
- value assignments are accessed through ValueRefs which does refcounting using RAII semantics
- ValueRefs then provide access to individual parts of a value using ValuePart(Refs) which can then be used to load them to registers, get their stack frame offset, etc.
- ValueRefs can also refer to a simple constant, when an assignment is accessed, user implementation gets a chance to simply return a constant ValueRef
 
- machine code generation complexity depends on instruction complexity
- can be as simple as accessing the assignment of the operands and result, loading the operands into registers, asking for a register for the result and then emitting instructions that implement the semantics of the IR instruction and make sure the result is in the result register given
Encoding Snippets:
- since writing down the instruction selection for each instruction is difficult, a lot of work and not portable, TPDE includes a tool to generate code that generates a specific sequence of instructions given a high-level function
- example: int addi32(int a, int b) { return a + b; } 
- the tool will now produce a function that will take two ValueParts corresponding to the two input registers, load both to registers, allocate a result register (or reuse one of the input registers if it can be overwritten), generate a LEAinstruction with these registers and return the result register to the caller.
- conceptually it will look like this: void encode_add(ValuePartRef&& a, ValuePartRef&& b, ScratchReg& result) {     AsmReg a_reg = a.load_to_reg();      AsmReg b_reg = b.load_to_reg();      AsmReg res_reg;     if (a.can_salvage())        res_reg = result.alloc(a.salvage());      else if (b.can_salvage())        res_reg = result.alloc(b.salvage());      else       res_reg = result.alloc();      ASM(LEA32rm, res_reg, MEM(a_reg, b_reg)); } 
- these function can be generated for multiple architectures and then called from architecture-independent code
- they can also take expressions that may be fused into memory operands of instructions
- significantly simplifies implementation of machine code generation
- instead of having to think about salvaging, loading registers, which instructions to use one can defer to the tool and focus on semantics
- e.g. instead of having to implement this for addition and subtraction, the implementation for these instructions can be as simple as bool compile_binary(IRInstRef inst, BinaryOp op) {   IRValueRef [lhs, rhs] = this->binary_ops(inst);   IRValueRef res = this->inst_res(inst);   ValueRef lhs = this->value_ref(lhs), rhs = this->value_ref(rhs);   ScratchReg result{this};   if (op == BinaryOp::add)     this->encode_add(lhs.part(0), rhs.part(0), result);   else     this->encode_sub(lhs.part(0), rhs.part(0), result);   this->result_ref(res).part(0).set_reg(result);   return true; } 
- extremely useful for fast porting to other architectures and when the necessary instruction sequences are long (e.g. float/double to/from int conversions)
 
- currently no support for branching to other basic blocks in the function or function calls