On this page, we will walk through implementing a back-end for the internal IR TPDE uses for testing. We will first explain the structure of the IR, implement the adaptor, fill in the basic required functions, implement the instructions and finally replace the manual instruction selection with one generated by the EncodeGen tool.

Note that this is intended as an outline on how to get a back-end going initially. For more advanced features and how to implement them please see the Compiler Reference or the LLVM implementation of TPDE.

Test-IR structure

The IR is an SSA-IR that can contain multiple functions per "module", each function is a list of basic blocks which are in turn lists of instructions which imply the execution order. Each instruction is at the same time a value which is the result produces by it. The IR only has 64 bit integer values.

Stack slots are allocated using alloca instructions in the first basic blocks which produce a pointer to their associated stack slots.

To unify values at control-flow edges, the IR uses phi nodes.

The textual representation can be described like the following (Note that the IR can also describe pure control-flow for testing which is not relevant here):

<funcName>(%<valName>, %<valName>, ...) {
<blockName>:
    %<valName> = alloca <size>
; Force fixed assignment if there is space
    %<valName>! = [operation] %<op1>, %<op2>, ...
    %<valName> = [operation] %<valName>, %<valName>, ...
    br %<blockName>
    <or>
    condbr %<valName>, %<trueTarget>, %<falseTarget>
    <or>
    tbz %<valName>, %<trueTarget>, %<falseTarget>, <bit_idx> ; test-bit-zero
 
<blockName>:
    %<valName> = phi [%<blockName>, %<valName>], [%<blockName>, %<valName>], ...
    terminate ; unreachable
    <or>
    ret %<valName>
}
 
<funcName>...

With regards to data structures, the IR is essentially parsed into four arrays which can then reference into each other using indices:

struct TestIR {
// ...
  // list of values(arguments/instructions)
  std::vector<Value> values;
  // list of value operands. can be an index into values or blocks or an immediate value
  std::vector<u32> value_operands;
  // list of blocks
  std::vector<Block> blocks;
  // list of functions
  std::vector<Function> functions;
};

A function has a name, a few flags, the begin and end index of its blocks in the blocks vector, and the begin and end index of its argument values in the values vector:

struct Function {
    std::string name;
    // is there only a declaration for this function
    bool declaration = false;
    // is this function visible externally?
    bool local_only = false;
    // does the function contain a call to another function
    bool has_call = false;
    // [begin, end[ in blocks
    u32 block_begin_idx = 0, block_end_idx = 0;
    // [begin, end[ in values
    u32 arg_begin_idx = 0, arg_end_idx = 0;
  };

A block has a name, the range of its successor blocks, the range of its instruction and the range of its phi nodes as well as the block storage required by the framework:

struct Block {
    std::string name;
    // [begin, end[ in value_operands
    u32 succ_begin_idx = 0, succ_end_idx = 0;
    // [begin, end[ in values, where the [begin, phi_end[ are the phi instructions (must be at the beginning)
    u32 inst_begin_idx = 0, phi_end_idx = 0, inst_end_idx = 0;
    // storage for TPDE
    u32 block_info = 0, block_info2 = 0;
  };

Finally, a value has a name, its instruction kind/type, its operation kind, whether it should be assigned a fixed register, and references to its operands.

struct Value {
    enum class Type : u8 {
      normal,
      arg,
      phi,
      terminator, // ret, branches, terminate
    };
 
    enum class Op : u8 {
      none, // not relevant
      any, // not relevant
      add,
      sub,
      alloca,
      terminate,
      ret,
      br,
      condbr,
      tbz, // test-bit-zero
      jump,
      call,
      zerofill, // generate zeroes in the instruction stream
    };
 
    std::string name;
    Type type;
    Op op = Op::none;
    // should the value be assigned a fixed register?
    bool force_fixed_assignment = false;
    // For call only: called function
    u32 call_func_idx;
    // Number of value operands
    u32 op_count;
 
    // op_count value operands first, then blocks, then constants
    // [begin, end[ into value_operands
    u32 op_begin_idx, op_end_idx;
  };

Adaptor

We will first define the reference types used: IRFuncRef, IRBlockRef, IRInstRef and IRValueRef. Since all of them are in different lists, it is natural to use their indices as the reference type. We will use u32 for this but in your case it might make more sense to use enum class types to have some type safety.

struct TestIRAdaptor {
    using IRFuncRef = u32;
    using IRBlockRef = u32;
    using IRInstRef = u32;
    using IRValueRef = u32;
 
    static constexpr IRFuncRef INVALID_FUNC_REF = u32(-1);
    static constexpr IRBlockRef INVALID_BLOCK_REF = u32(-1);
    static constexpr IRValueRef INVALID_VALUE_REF = u32(-1);

Then we define some configuration options. The adaptor can provide the highest local index a value can have since we will use the value index as its local index and arguments are not included in the normal instruction stream so the liveness analysis will have to visit them explicitly.

static constexpr bool TPDE_PROVIDES_HIGHEST_VAL_IDX = true;

static constexpr bool TPDE_LIVENESS_VISIT_ARGS = true;

Now we can start implementing the required functions. To access the IR, the adaptor will have a pointer to the IR structure. First, providing access to the list of functions:

TestIR *ir;
 
u32 func_count() const noexcept {
  return u32(ir->functions.size());
}
 
auto funcs() const noexcept {
  // needs to provide an iterator over IRFuncRef which in our case are indices
  // so we construct that using the ranges api
  return std::views::iota(0u, u32(ir->functions.size()));
}
 
auto funcs_to_compile() const noexcept {
  // filter functions which are declarations
  return funcs() | std::views::filter([ir](u32 idx) { return !ir->functions[idx].declaration; });
}

Then we add general function information

std::string_view func_link_name(IRFuncRef func) const noexcept {
  return ir->functions[func].name;
}
 
bool func_extern(IRFuncRef func) const noexcept {
  return ir->functions[func].declaration;
}
 
bool func_only_local(IRFuncRef func) const noexcept {
  return ir->functions[func].local_only;
}
 
static bool func_has_weak_linkage(IRFuncRef) noexcept {
  // the IR does not support weak linkage
  return false;
}

Then we can add information about the current function. For this we will add a member cur_func which will hold the currently selected function which we change when switch_func is called. To return the highest local value index, which we said the adaptor provides, we will also add a member highest_local_val_idx which is calculated in switch_func.

u32 cur_func = INVALID_FUNC_REF;
u32 highest_local_val_idx = 0;
 
static bool cur_needs_unwind_info() noexcept {
  // the IR doesn't support unwinding through our IR
  return false;
}
 
static bool cur_is_vararg() noexcept {
  // the IR doesn't support vararg functions
  return false;
}
 
u32 cur_highest_val_idx() const noexcept {
  return highest_local_val_idx;
}
 
auto cur_args() const noexcept {
  // return the range of indices
  const Function& func = ir->functions[cur_func];
  return std::iota(func.arg_begin_idx, func.arg_end_idx);
}
 
// the IR does not support byval or struct returns since it only has 64 bit values
static bool cur_arg_is_byval(u32) noexcept { return false; }
static u32 cur_arg_byval_align(u32) noexcept { return 0; }
static u32 cur_arg_byval_size(u32) noexcept { return 0; }
 
static bool cur_arg_is_sret(u32) noexcept { return false; }
 
auto cur_static_allocas() const noexcept {
  // all alloca instructions in the first basic block
  // are static allocas so we filter them using ranges
  const auto &block = ir->blocks[ir->functions[cur_func].block_begin_idx];
  return std::views::iota(block.inst_begin_idx, block.inst_end_idx) |
         std::views::filter([ir](u32 val) {
           return ir->values[val].op == TestIR::Value::Op::alloca;
         });
}
 
static bool cur_has_dynamic_alloca() noexcept {
  // the IR does not support dynamic stack allocations
  return false;
}
 
IRBlockRef cur_entry_block() const noexcept {
  return ir->functions[cur_func].block_begin_idx;
}
 
auto cur_blocks() const noexcept {
  // simple iterator over the indices using the ranges API
  const auto &func = ir->functions[cur_func];
  return std::views::iota(func.block_begin_idx, func.block_end_idx);
}

Then we can provide the information about blocks which are mostly straightforward ranges manipulations:

auto block_succs(IRBlockRef block) const noexcept {
  const auto &info = ir->blocks[block];
  // block successor indices are stored in value_operands
  const auto *data = ir->value_operands.data();
  // then we return a range over the indices in value_operands
  return std::ranges::subrange(data + info.succ_begin_idx,
                               data + info.succ_end_idx);
}
 
auto block_insts(IRBlockRef block) const noexcept {
  const auto &info = ir->blocks[block];
  // return a direct range over the indices
  // note that we start after the PHI nodes which makes it easier
  // to implement the inst_operands function later
  return std::views::iota(info.phi_end_idx, info.inst_end_idx);
}
 
auto block_phis(IRBlockRef block) const noexcept {
  const auto &info = ir->blocks[block];
  // PHI nodes must be the first instructions in a block
  return std::views::iota(info.inst_begin_idx, info.phi_end_idx);
}
 
u32 block_info(IRBlockRef block) const noexcept {
  // provide access to the internal storage
  return ir->blocks[block].block_info;
}
 
void block_set_info(IRBlockRef block, u32 info) noexcept {
  ir->blocks[block].block_info = info;
}
 
u32 block_info2(IRBlockRef block) const noexcept {
  // provide access to the internal storage
  return ir->blocks[block].block_info2;
}
 
void block_set_info2(IRBlockRef block, u32 info) noexcept {
  ir->blocks[block].block_info2 = info;
}
 
std::string_view block_fmt_ref(IRBlockRef block) const noexcept {
  // when printing the block reference, we want to print the name
  return ir->blocks[block].name;
}

Now follows the information about values. First is the local index for values. Since it should be dense and start at zero, we will use the index in the values vector but subtract the index of the first value in the function from it.

u32 val_local_idx(IRValueRef val) const noexcept {
  return val - ir->functions[cur_func].arg_begin_index;
}
 
bool val_ignore_in_liveness_analysis(IRValueRef val) const noexcept {
  // we do not want any liveness information for stack slots
  // since those are live throughout the whole function
  return ir->values[val].op == TestIR::Value::Op::alloca;
}

Then is information about PHIs. For this we need to create a helper class that keeps enough information to return information about the number of incoming values and their associated basic blocks.

bool val_is_phi(IRValueRef val) const noexcept {
  return ir->values[val].type == TestIR::Value::Type::phi;
}
 
struct PHIRef {
  // The operands of the PHIs are stored grouped by value and block
  // so the value_operands slice referenced by a PHI node looks like this:
  // incoming_value1,incoming_value2,...,incoming_block1,incoming_block2,...
  // so op_begin points to incoming_value1 and block_begin points to incoming_block1
  u32* op_begin, *block_begin;
 
  u32 incoming_count() const noexcept {
    return u32(block_begin - op_begin);
  }
 
  IRValueRef incoming_val_for_slot(u32 slot) const noexcept {
    return op_begin[slot];
  }
 
  IRBlockRef incoming_block_for_slot(u32 slot) const noexcept {
    return block_begin[slot];
  }
 
  IRValueRef incoming_val_for_block(IRBlockRef block) const noexcept {
    // find the index in the block array, then index into the value array
    for (auto *op = op_begin; op < block_begin; ++op) {
      if (block_begin[op - op_begin] == block) {
        return static_cast<IRValueRef>(*op);
      }
    }
 
    return INVALID_VALUE_REF;
  }
};
 
PHIRef val_as_phi(IRValueRef val) const noexcept {
  const auto &info = ir->values[val_idx];
  const auto *data = ir->value_operands.data();
  return PHIRef{data + info.op_begin_idx,
                data + info.op_begin_idx + info.op_count};
}

Then there's only information about allocas left.

u32 val_alloca_size(IRValueRef val) const noexcept {
  // The size of an alloca is its immediate operand which is sstored in value_operands
  const auto *data = ir->value_operands.data();
  return data[ir->values[val].op_begin_idx];
}
 
u32 val_alloca_align(IRValueRef val) const noexcept {
  // The alignment is the second immediate operand of an alloca
  const auto *data = ir->value_operands.data();
  return data[ir->values[val].op_begin_idx + 1];
}
 
std::string_view value_fmt_ref(IRValueRef val) const noexcept {
  // when formatting a value, we simply print its name
  return ir->values[val].name;
}

Closing in, there's only a bit of information about instruction operands and results left. We only need to check whether an instruction actually produces a result.

auto inst_operands(IRInstRef inst) const noexcept {
  // op_count indicates the number of value operands which are first in the operand
  // list so we can simply return a range over it
  const auto &info = ir->values[inst];
  const auto *data = ir->value_operands.data();
  return std::ranges::subrange(data + info.op_begin_idx,
                               data + info.op_begin_idx + info.op_count);
}
 
auto inst_results(IRInstRef inst) const noexcept {
  const auto &info = ir->values[inst];
  bool is_def = false;
  switch (ir->values[inst].op) {
    using enum TestIR::Value::Op;
    case any:
    case add:
    case sub:
    case alloca:
    case call:
    case zerofill: us_def = true; break;
    default: break;
  }
  // either return the instruction reference itself or an empty range
  // if the instruction does not return a result
  return std::views::single(inst) | std::views::drop(!is_def);
}
 
static bool inst_fused(IRInstRef) noexcept {
  // we don't fuse any instruction
  return false;
}
 
std::string_view inst_fmt_ref(IRInstRef inst) const noexcept {
  // to print an instruction, we simply print its name
  return ir->values[inst].name;
}

Finally, we only need to implement the logic that calculates the highest local value index when the compiler switches to a new function

  static void start_compile() {}
  static void end_compile() {}
 
  bool switch_func(IRFuncRef func) noexcept {
    cur_func = func;
 
    highest_local_val_idx = 0;
 
    const auto &info = ir->functions[cur_func];
    highest_local_val_idx =
        ir->blocks[info.block_end_idx - 1].inst_end_idx - info.arg_begin_idx;
    if (highest_local_val_idx > 0) {
      // currently, highest_local_val_idx contains the number of instructions/values in the function
      --highest_local_val_idx;
    }
 
    return true;
  }
 
  void reset() {
    // technically, we don't need to do anything
    cur_func = INVALID_FUNC_REF;
    highest_local_val_idx = 0;
  }
};

Basic Compiler structure

We will divide the compiler into two classes: One for architecture-independent/-spanning functionality and one for x86-64 specific details. Later we will also add an implementation for AArch64. These will live in two different files:

// TestIRCompilerBase.hpp
#include <tpde/CompilerBase.hpp>
 
// adaptor from earlier
#include "TestIRAdaptor.hpp"
 
template<typename Adaptor, typename Derived, typename Config>
struct TestIRCompilerBase : tpde::CompilerBase<TestIRAdaptor, Derived, Config> {
    using Base = tpde::CompilerBase<TestIRAdaptor, Derived, Config>;
 
    TestIRCompilerBase(TestIRAdaptor *adaptor) : Base{adaptor} {
        static_assert(tpde::Compiler<Derived, Config>);
        // the class has to take an adaptor parameter but we know that it always will be the TestIRAdaptor
        static_assert(std::is_same_v<Adaptor, TestIRAdaptor>);
    }
 
    // shortcuts to access the derived class later
    Derived *derived() noexcept { return static_cast<Derived *>(this); }
    const Derived *derived() const noexcept {
        return static_cast<Derived *>(this);
    }
 
    const TestIR* ir() const noexcept {
        this->adaptor->ir;
    }
 
    // TODO
}

// TestIRCompilerX64.cpp
#include <tpde/x64/CompilerX64.hpp>
 
#include "TestIRCompilerBase.hpp"
 
// we will use the default config
// if you don't need to change anything you can also omit it
struct CompilerConfig : tpde::x64::PlatformConfig {};
 
struct TestIRCompilerX64 : tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig> {
    using Base = tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
 
    explicit TestIRCompilerX64(TestIRAdaptor *adaptor)
        : Base{adaptor} {
      static_assert(tpde::Compiler<TestIRCompilerX64, tpde::x64::PlatformConfig>);
    }
 
    // TODO
};

Note that the x86-64 implementation is in a cpp file as, currently, we cannot have implementations for two different architectures compiled in the same translation unit at the same time.

Required functions

Base class

Since we only want to compile for x86-64 and AArch64, we can implement a few functions in an architecture-independent manner.

struct TestIRCompilerBase // ...
 
    bool cur_func_may_emit_calls() {
        // we can grab this directly from the IR
        this->ir()->functions[this->adaptor->cur_func].has_call;
    }
 
    static typename CompilerConfig::Assembler::SymRef cur_personality_func() {
        // as the IR has no means of handling exceptions or specifying unwind actions
        // there are no personality functions
        return {};
    }
 
    bool try_force_fixed_assignment(IRValueRef value) const noexcept {
        // our example IR has a flag to try to force a fixed assignment on a value.
        // you likely don't want to do this
        return this->ir()->values[value].force_fixed_assignment;
    }
 
    // since both supported architectures have 64-bit GPRs, the register
    // usage for both is the same.
    // Further, since all values are 64-bit integers, we can statically determine
    // the return values for all ValueParts functions.
    // If your IR supports more types, it is likely best to precompute the register
    // usage in val_parts and then have it as data members in ValueParts
    struct ValueParts {
        // all values use one register
        static u32 count() { return 1; }
        // all values are 64-bit
        static u32 size_bytes(u32 /* part_idx */) { return 8; }
        // all values are integers and therefore use GP registers
        static tpde::RegBank reg_bank(u32 /* part_idx */) {
            return CompilerConfig::GP_BANK;
        }
    };
    
    static ValueParts val_parts(IRValueRef) { return ValueParts{}; }
 
    // our IR has no special values/constants
    static std::optional<ValRefSpecial> val_ref_special(IRValueRef) {
        return {};
    }
    ValuePartRef val_part_ref_special(ValRefSpecial&, u32 /* part_idx */) noexcept {
        TPDE_UNREACHABLE("val_part_ref_special called when IR does not have special values");
    }
 
    static void define_func_idx(IRFuncRef func, u32 idx) {
        // we do not need that information, however when compiling calls we rely on
        // the index here to be the same as the index of the function in the IR
        // so we assert this here.
        // (If that wouldn't be the case we could simply use a lookup array)
        assert(func == idx);
    }
 
    bool compile_inst(IRInstRef inst, InstRange remaining) noexcept {
        // since we want to compile some instructions in an architecture-independent way
        // we implement this function in TestIRCompilerBase and call into the derived
        // class for the actual code generation if necessary
 
        const TestIR::Value &value = this->analyzer.adaptor->ir->values[inst_idx];
        // arguments and PHIs should not show up in the block_insts range
        assert(value.type == TestIR::Value::Type::normal ||
          value.type == TestIR::Value::Type::terminator);
 
        // simply switch over the opcode and compile
        switch (value.op) {
            using enum TestIR::Value::Op;
            case alloca:
                // we do not need to do anything for this
                // as the framework already allocated the stack slots for them
                // and their value will be calculated on-demand
                return true;
            case add: return derived()->compile_add(inst);
            case sub: return derived()->compile_sub(inst);
            case ret: return this->compile_ret(inst);
            case terminate: return this->compile_terminate();
            case br: return derived()->compile_br(inst);
            case condbr: return derived()->compile_condbr(inst);
            case tbz: return derived()->compile_tbz(inst);
            case call: return derived()->compile_call(inst);
            default: TPDE_LOG_ERR("encountered unimplemented instruction"); return false;
        }
    }

Note that for two instructions, ret and terminate, we do not call into the derived class. This is because their logic does not actually require emitting any architecture-specific instructions ourselves.

    bool compile_terminate() noexcept {
        // this will restore callee-saved registers and return
        // and is implemented in tpde::x64::CompilerX64 or tpde::a64::CompilerA64
        derived()->gen_func_epilog();
 
        // need to do this after return
        this->release_regs_after_return();
        return true;
    }
 
    bool compile_ret(IRInstRef inst) noexcept {
        // a return simply has to move the value to be returned into the return register.
        // since we only have single-register integer values, this is very easy to implement
        // in the base class using the calling convention information from the derived class.
 
        const TestIR::Value &value = this->analyzer.adaptor->ir->values[inst_idx];
        IRValueRef ret_op = this->ir()->value_operands[value.op_begin_idx];
 
        // Create the RetBuilder
        RetBuilder rb{this, *this->cur_cc_assigner()};
        rb.add(ret_op);
 
        // generate the return
        rb.ret();
        return true;
    }
};

x86-64 class

In TestIRCompilerX64, currently no function is required by the framework.

struct TestIRCompilerX64 // ...
 
};

As for the functions referenced by the base class, we can simply stub them out with implementations which return false for now.

Compiling to an object file

To compile to an object file, we simply need to construct an adaptor and a compiler and then let them compile.

TestIR* ir = /* parse IR */;
 
TestIRAdaptor adaptor{ir};
TestIRCompilerX64 compiler{&adaptor};
 
if (!compiler.compile()) {
    TPDE_LOG_ERR("Failed to compile IR");
    return 1;
}
 
std::vector<u8> data = compiler.assembler.build_object_file();
std::ofstream out_file{obj_out_path.Get(), std::ios::binary};
out_file.write(reinterpret_cast<const char *>(data.data()), data.size());

If you are interested in the full code you can check out tpde/src/test/test_main.cpp in the source tree.

Instruction selection

We have already implemented instruction selection for returns, so we can already compile simple functions like this:

simple_ret(%a, %b) {
entry:
; X64: sub rsp,0x30
; X64-NEXT: mov rax,rsi
; X64: add rsp,0x30
  ret %b
}

which will give us an object file with this code:

simple_ret:
push rbp
mov rbp, rsp
sub rsp, 0x30
nop word ptr [rax + rax]
mov rax, rsi
add rsp, 0x30
pop rbp
ret

We now can move on to simple operations, branches and calls.

Simple operations

The IR only has add and sub as simple operations which we can implement in a rather straightforward manner following the guide in the Compiler Reference

// in TestIRCompilerX64
bool compile_add(IRInstRef inst) {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
    const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
 
    auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_idx);
    auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_idx);
    auto [res_ref, res_part] = this->result_ref_single(inst);
 
    AsmReg lhs_reg = lhs_part.load_to_reg();
    AsmReg rhs_reg = rhs_part.load_to_reg();
    AsmReg res_reg = res_part.alloc_try_reuse(lhs_part);
  
    if (res_reg == lhs_reg) {
        ASM(ADD64rr, res_reg, rhs_reg);
    } else {
        // generates lea <res_reg>, [<lhs_reg> + <rhs_reg>]
        ASM(LEA64rm, res_reg, FE_MEM(lhs_reg, 1, rhs_reg, 0));
    }
    res.set_modified();
    return true;
}
 
bool compile_sub(IRInstRef inst) {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
    const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
 
    auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_idx);
    auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_idx);
    auto [res_ref, res_part] = this->result_ref_single(inst);
 
    // we overwrite the lhs destructively so we need to make sure we can do that
    ValuePartRef tmp_part = lhs_part.into_temporary();
    AsmReg lhs_reg = tmp_part.cur_reg();
    AsmReg rhs_reg = rhs_part.load_to_reg();
 
    ASM(SUB64rr, lhs_reg, rhs_reg);
    res_part.set_value(std::move(tmp_part));
    return true;
}

Unconditional branches

Unconditional branches are also straightforward to follow from the Compiler Reference: Simply grab the block reference and ask the framework to do an unconditional jump.

bool compile_br(IRInstRef inst) noexcept {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    IRBlockRef target = this->ir()->value_operands[value.op_begin_idx];
    
    const auto spilled = this->spill_before_branch();
    this->begin_branch_region();
 
    this->generate_branch_to_block(Jump::jmp, target, /* needs_split = */ false, /* last_inst = */ true);
 
    this->end_branch_region();
    this->release_spilled_regs(spilled);
 
    return true;
}

Conditional branches

Conditional branches require us to do the comparisons (as they are built into the branch) and then select the suitable branch condition. Otherwise we can copy from the reference again.

void generate_condbr(tpde::x64::CompilerX64::Jump cc, 
    IRBlockRef true_target, IRBlockRef false_target) noexcept {
    
    bool true_needs_split = this->branch_needs_split(true_target);
    bool false_needs_split = this->branch_needs_split(false_target);
    const IRBlockRef next_block = this->analyzer.block_ref(this->next_block());
 
    // let the framework spill
    const auto spilled = this->spill_before_branch();
    this->begin_branch_region();
 
    if (next_block == true_target || (next_block != false_target && true_needs_split)) {
        // if the following block is the true target or if we have to always emit a branch but a branch to the true block
        // is heavy (i.e. needs to be split) then we want to first jump to the false block
        this->generate_branch_to_block(this->invert_jump(cc), false_target, false_needs_split, /* last_inst = */ false);
        // if the next block is the true_target, then the jump will not be emitted
        this->generate_branch_to_block(Jump::jmp, true_target, /* needs_split = */ false, /* last_inst = */ true);
    } else {
        // try to elide the branch to the false_target
        this->generate_branch_to_block(cc, true_target, true_needs_split, /* last_inst = */ false);
        this->generate_branch_to_block(Jump::jmp, false_target, /* needs_split = */ false, /* last_inst = */ true);
    }
 
    this->end_branch_region();
    this->release_spilled_regs(spilled);
}
 
bool compile_condbr(IRInstRef inst) noexcept {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    IRValueRef cond_val = this->ir()->value_operands[value.op_begin_idx];
    IRBlockRef true_target = this->ir()->value_operands[value.op_begin_idx + 1];
    IRBlockRef false_target = this->ir()->value_operands[value.op_begin_idx + 2];
 
    auto [_, cond_part] = this->val_ref_single(cond_val);
 
    auto cond_reg = cond_part.load_to_reg();
    ASM(CMP64ri, cond_reg, 0);
 
    // we want to jump to the true target of the condition is not zero, so Jump::jne
    this->generate_condbr(Jump::jne, true_target, false_target);
    return true;
}
 
bool compile_tbz(IRInstRef inst) noexcept {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    IRValueRef cond_val = this->ir()->value_operands[value.op_begin_idx];
    IRBlockRef true_target = this->ir()->value_operands[value.op_begin_idx + 1];
    IRBlockRef false_target = this->ir()->value_operands[value.op_begin_idx + 2];
    u32 bit_idx = this->ir()->value_operands[value.op_begin_idx + 3];
 
    auto [_, cond_part] = this->val_ref_single(cond_val);
 
    auto cond_reg = cond_part.load_to_reg();
    // if we can encode the bit as a 32 bit immediate we use the test instruction
    // otherwise bit-test since that is a bit slower
    Jump cc;
    if (bit_idx < 31) {
        cc = Jump::jne; // jump to true_target if bit is not zero
        ASM(TEST64ri, cond_reg, (1u << bit_idx));
    } else {
        cc = Jump::jb; // jump if carry flag is set
        ASM(BT64ri, cond_reg, bit_idx);
    }
 
    this->generate_condbr(cc, true_target, false_target);
    return true;
}

Calls

Now the only major feature left is calls. Since our IR does not do sign-/zero-extension or other fancy stuff with the arguments we can simply reuse much of the code from the reference.

bool compile_call(IRInstRef inst) noexcept {
    const auto func_idx = value.call_func_idx;
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    auto operands = std::span<IRValueRef>{(this->ir()->value_operands.data() + value.op_begin_idx),
        value.op_count};
 
    auto [_, res_part] = this->result_ref_single(inst);
 
    util::SmallVector<CallArg, 8> arguments{};
    for (IRValueRef op : operands) {
      arguments.push_back(CallArg{op});
    }
 
    this->generate_call(this->func_syms[func_idx], // we checked that this holds earlier
                        arguments,
                        std::span{static_cast<ValuePart *>(&res_part), 1}, // cast needed as ValuePartRef : ValuePart
                        /* vararg */false);
    return true;
}

This concludes everything we need to do to have a basic x86-64 implementation of a compiler back-end for our IR.

Using EncodeGen

To not have to implement instruction selection for everything ourselves, we're going to start using the EncodeGen tool TPDE provides to implement add/sub.

We will first have to add a file with snippets, generate a file containing the snippet encoders, adding them to the compiler and then calling them.

Snippet File

Since our IR is simple, the snippets will also be very simple:

// snippets.c
unsigned long addi64(unsigned long a, unsigned long b) { return a + b; } 
unsigned long subi64(unsigned long a, unsigned long b) { return a - b; } 

Generating Snippet Encoders

To generate snippet encoders for x86-64, we will use the following invocations of clang and tpde-encodegen following the reference:

clang -c -emit-llvm -ffreestanding -fcf-protection=none -O3 -fomit-frame-pointer -fno-math-errno --target=x86_64 -march=x86-64-v4 -o snippets_x64.bc snippets.c

tpde_encodegen -o snippet_encoders_x64.hpp snippets_x64.bc

Add them to the compiler

Following the reference again, we will add the EncodeCompiler defined in snippet_encoders_x64.hpp to TestIRCompilerX64:

// TestIRCompilerX64.cpp
#include "snippet_encoders_x64.hpp"
 
struct TestIRCompilerX64 : tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>,
    tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig> {
    using Base = tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
    using EncCompiler = tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
 
    explicit TestIRCompilerX64(TestIRAdaptor *adaptor)
        : Base{adaptor} {
      static_assert(tpde::Compiler<TestIRCompilerX64, tpde::x64::PlatformConfig>);
    }
 
    void reset() noexcept {
        // we need to override reset to call both Base and EncCompiler's reset function
        Base::reset();
        EncCompiler::reset();
    }
 
};

Replacing architecture-dependent code

Now we can simply replace our implementations for compile_add/compile_sub in TestIRCompilerX64 and move them to TestIRCompilerBase.

// in TestIRCompilerBase
bool compile_add(IRInstRef inst) {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
    const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
 
    ValueRef lhs_ref = this->val_ref(lhs_idx);
    ValueRef rhs_ref = this->val_ref(lhs_idx);
    ValueRef res_ref = this->result_ref(inst);
 
    ScratchReg res_scratch{this};
    if (!derived()->encode_addi64(lhs_ref.part(0), rhs_ref.part(0), res_scratch)) {
        return false;
    }
 
    this->set_value(res_ref.part(0), res_scratch);
    return true;
}
 
bool compile_sub(IRInstRef inst) {
    const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
 
    const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
    const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
 
    ValueRef lhs_ref = this->val_ref(lhs_idx);
    ValueRef rhs_ref = this->val_ref(lhs_idx);
    ValueRef res_ref = this->result_ref(inst);
 
    ScratchReg res_scratch{this};
    if (!derived()->encode_subi64(lhs_ref.part(0), rhs_ref.part(0), res_scratch)) {
        return false;
    }
 
    this->set_value(res_ref.part(0), res_scratch);
    return true;
}

Note that you would probably want to combine both of these into one function that simply switches on the binary operation.

Porting to AArch64

To port the compiler, we need to create a new cpp file with a TestIRCompilerA64 and implement its required functionality and instructions that EncodeGen currently cannot generate which includes branches to other basic blocks and calls. Therefore, we would need to implement call, br, condbr and tbz again.

First, we generate snippet encoders for AArch64:

clang -c -emit-llvm -ffreestanding -fcf-protection=none -O3 -fomit-frame-pointer -fno-math-errno --target=aarch64 -march=armv8.1-a -o snippets_a64.bc

tpde_encodegen -o snippet_encoders_a64.hpp snippets_a64.bc

And then we can create the compiler:

// TestIRCompilerA64.cpp
#include <tpde/a64/CompilerA64.hpp>
#include "snippet_encoders_a64.hpp"
 
struct TestIRCompilerA64 : tpde::a64::CompilerA64<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig>,
    tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig> {
    using Base = tpde::a64::CompilerA64<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig>;
    using EncCompiler = tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
 
    explicit TestIRCompilerA64(TestIRAdaptor *adaptor)
        : Base{adaptor} {
      static_assert(tpde::Compiler<TestIRCompilerA64, tpde::a64::PlatformConfig>);
    }
 
    void reset() noexcept {
        // we need to override reset to call both Base and EncCompiler's reset function
        Base::reset();
        EncCompiler::reset();
    }
 
    static tpde::a64::CallingConv cur_calling_convention() {
        return tpde::a64::CallingConv::SYSV_CC;
    }
    
    // TODO: implementations for call, br, condbr and tbz
};

We will omit the implementations for the architecture-specific instructions since they are very similar to the implementations provided above.

Now we only need to change the implementation to compile the IR for both architectures. As we cannot include both of the compilers in the same translation unit due to macro clashes we will need to provide implementations to compile the IR in both TUs with an interface that doesn't require including TPDE headers.

// TestIRCompilerX64.cpp
std::optional<std::vector<u8>> compile_ir_x64(TestIR* ir) {
    TestIRAdaptor adaptor{ir};
    TestIRCompilerX64 compiler{&adaptor};
    
    if (!compiler.compile()) {
        return {};
    }
    
    return compiler.assembler.build_object_file();
}

// TestIRCompilerA64.cpp
std::optional<std::vector<u8>> compile_ir_a64(TestIR* ir) {
    TestIRAdaptor adaptor{ir};
    TestIRCompilerA64 compiler{&adaptor};
    
    if (!compiler.compile()) {
        return {};
    }
    
    return compiler.assembler.build_object_file();
}

and change the main compilation code accordingly

TestIR* ir = /* parse IR */;
 
std::optional<std::vector<u8>> data{};
if (arch.Get() == Arch::X64) {
    data = compile_ir_x64(ir);
} else {
    data = compile_ir_a64(ir);
}
 
if (!data) {
    TPDE_LOG_ERR("Failed to compile IR");
    return 1;
}
 
std::ofstream out_file{obj_out_path.Get(), std::ios::binary};
out_file.write(reinterpret_cast<const char *>(data->data()), data->size());

Previous	Next
EncodeGen Reference