TPDE
Loading...
Searching...
No Matches
Overview

Framework divides back-end into 7 components:

  • user-provided IRAdaptor
  • Analyzer for computing block layout and liveness information
  • base Compiler which drives compilation
  • base Compiler for each architecture
  • Assembler to handle object file generation/mapping
  • user-provided instruction-selection
  • encoding snippets generated by the framework from user-provided high-level functions
┌───────────┐ ┌───────────────────────────┐ ┌───────────┐
│ │◄─────┼ ├────►│ │
│ IRAdaptor │ │ CompilerBase │ │ Assembler │
│ │ │ │ │ │
└───────────┘ │ ┌────────┐ │ └───────────┘
│ │Analyzer│ │
│ └────────┘ │
└─┬───▲──────────┬──▲────▲──┘
│ │ │ │ │
│ │ │ │ │
┌───▼───┴──────┐ │ │ ┌─┴─────────────────┐
│ Compiler for │◄──┼──┼──┤ Encoding Snippets │
│ <arch> │ │ │ │ (optional) │
└──────┬─▲─────┘ │ │ └─▲─────────────────┘
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
┌─▼─┴─────────▼──┴────┴─────┐
│ │
│ Compiler specific to <IR> │
│ │
└───────────────────────────┘

IRAdaptor:

  • user-provided class
  • provides functions and definition with which the Compiler can interact with the IR
  • this includes a list of functions
  • before a function is compiled, adaptor gets a chance to precompute any necessary data structures during compilation
  • afterwards adaptor needs to provide an iterator over function arguments, static stack slots, basic blocks, PHIs in each block, instructions in each basic block and their operands and result values
  • more detailed description in IRAdaptor reference

Analyzer:

  • manages block layout and liveness information for all values
  • function analysis done as a first pass before actually compiling the function
  • iterates once over the basic blocks provided by the adaptor
  • computes block layout mostly according to reverse post-order
  • then one pass over all instructions collecting liveness information
  • for each value, computes begin and end of its live interval, a refcount and a flag indicating whether the value's assignment can be free'd after its refcount reaches zero or whether the assignment can only be free'd after the end of the live interval

CompilerBase:

  • main driver of compilation and provides the entry point into compilation [compile()]
  • tracks currently active assignments (register information and stack slot for a value)
  • contains architecture-independent logic of the register allocation (PHI handling, spilling decisions)
  • when compilation starts, it sets up symbols for each function and gives derived compilers a chance to do initial setup (e.g. generation of data for global values)
  • then proceeds in a per-function manner:
    • notify Adaptor that a new function is being compiled
    • ask analyzer to compute block layout and liveness information
    • setup of static stack slots
    • ask architecture-specific implementation to generate the function prologue and set-up assignments for arguments
    • iterate over each block in the order specified by the analyzer:
      • iterate over each instruction in the block as given by the Adaptor and ask the user-provided Compiler implementation to generate the machine code for the instruction
      • free any assignments which are no longer live that could not be free'd when their refcount reached zero

CompilerBase for each architecture:

  • architecture-specific functionality such as prologue/epilogue generation and argument setup
  • otherwise, mostly helpers for commonly needed features like spilling/loading registers to/from the stack, constant materialization, code generation for (un)conditional branches and function call generation

Assembler:

  • architecture- and platform-specific; currently only x86-64/AArch64 Linux ELF
  • keeps track of symbols and section contents as well as relocations
  • also houses unwind and C++ exception table information
  • at the end, produces a finished object file or may also be able to map the code directly into memory and do all fixup required to make it executable

user-provided Compiler implementation:

  • provides architecture-specific information about value parts
    • value may be split into arbitrarily many parts
    • each part has a specific size and register bank
    • each part must fit into exactly one register
  • some information for calling convention when dealing with multi-part values
    • can a value be passed in both registers and the stack?
    • does a value have to be aligned in registers?
  • most importantly, provides a function to generate machine code for each instruction
  • interlude: value assignment tracking
    • value assignments are accessed through ValueRefs which does refcounting using RAII semantics
    • ValueRefs then provide access to individual parts of a value using ValuePart(Refs) which can then be used to load them to registers, get their stack frame offset, etc.
    • ValueRefs can also refer to a simple constant, when an assignment is accessed, user implementation gets a chance to simply return a constant ValueRef
    • EXAMPLE
  • machine code generation complexity depends on instruction complexity
  • can be as simple as accessing the assignment of the operands and result, loading the operands into registers, asking for a register for the result and then emitting instructions that implement the semantics of the IR instruction and make sure the result is in the result register given
  • TODO: EXAMPLE? may be a bit overwhelming

encoding snippets:

  • since writing down the instruction selection for each instruction is difficult, a lot of work and not portable, TPDE includes a tool to generate code that generates a specific sequence of instructions given a high-level function
  • example:
    int addi32(int a, int b) { return a + b; }
    which yields the following machine code
    lea eax, [edi + esi]
    ret
    • the tool will now produce a function that will take two ValueParts corresponding to the two input registers, load both to registers, allocate a result register (or reuse one of the input registers if it can be overwritten), generate a LEA instruction with these registers and return the result register to the caller.
    • conceptually it will look like this:
      void encode_add(ValuePartRef&& a, ValuePartRef&& b, ScratchReg& result) {
      AsmReg a_reg = a.load_to_reg(); // make sure a is in a register
      AsmReg b_reg = b.load_to_reg(); // make sure b is in a register
      AsmReg res_reg;
      if (a.can_salvage()) // is the value of a needed later?
      res_reg = result.alloc(a.salvage()); // can reuse register of a
      else if (b.can_salvage()) // is the value of b needed later?
      res_reg = result.alloc(b.salvage()); // can reuse register of b
      else
      res_reg = result.alloc(); // need to allocate a new register
      ASM(LEA32rm, res_reg, MEM(a_reg, b_reg));
      }
    • these function can be generated for multiple architectures and then called from architecture-independent code
    • they can also take expressions that may be fused into memory operands of instructions
    • significantly simplifies implementation of machine code generation
      • instead of having to think about salvaging, loading registers, which instructions to use one can defer to the tool and focus on semantics
      • e.g. instead of having to implement this for addition and subtraction, the implementation for these instructions can be as simple as
        bool compile_binary(IRInstRef inst, BinaryOp op) {
        IRValueRef [lhs, rhs] = this->binary_ops(inst);
        IRValueRef res = this->inst_res(inst);
        ValueRef lhs = this->value_ref(lhs), rhs = this->value_ref(rhs);
        ScratchReg result{this};
        if (op == BinaryOp::add)
        this->encode_add(lhs.part(0), rhs.part(0), result);
        else
        this->encode_sub(lhs.part(0), rhs.part(0), result);
        this->result_ref(res).part(0).set_reg(result);
        return true;
        }
        which is completely architecture-independent without any regards to the actual instructions being generated
      • extremely usefule for fast porting to other architectures and when the necessary instruction sequences are long (e.g. float/double to/from int conversions)
    • currently no support for branching to other basic blocks in the function or function calls

Code Layout: