On this page, we will walk through implementing a back-end for the internal IR TPDE uses for testing. We will first explain the structure of the IR, implement the adaptor, fill in the basic required functions, implement the instructions and finally replace the manual instruction selection with one generated by the EncodeGen tool.
Note that this is intended as an outline on how to get a back-end going initially. For more advanced features and how to implement them please see the Compiler Reference or the LLVM implementation of TPDE.
Test-IR structure
The IR is an SSA-IR that can contain multiple functions per "module", each function is a list of basic blocks which are in turn lists of instructions which imply the execution order. Each instruction is at the same time a value which is the result produces by it. The IR only has 64 bit integer values.
Stack slots are allocated using alloca
instructions in the first basic blocks which produce a pointer to their associated stack slots.
To unify values at control-flow edges, the IR uses phi nodes.
The textual representation can be described like the following (Note that the IR can also describe pure control-flow for testing which is not relevant here):
<funcName>(%<valName>, %<valName>, ...) {
<blockName>:
%<valName> = alloca <size>
; Force fixed assignment if there is space
%<valName>! = [operation] %<op1>, %<op2>, ...
%<valName> = [operation] %<valName>, %<valName>, ...
br %<blockName>
<or>
condbr %<valName>, %<trueTarget>, %<falseTarget>
<or>
tbz %<valName>, %<trueTarget>, %<falseTarget>, <bit_idx> ; test-bit-zero
<blockName>:
%<valName> = phi [%<blockName>, %<valName>], [%<blockName>, %<valName>], ...
terminate ; unreachable
<or>
ret %<valName>
}
<funcName>...
With regards to data structures, the IR is essentially parsed into four arrays which can then reference into each other using indices:
struct TestIR {
std::vector<Value> values;
std::vector<u32> value_operands;
std::vector<Block> blocks;
std::vector<Function> functions;
};
A function has a name, a few flags, the begin and end index of its blocks in the blocks
vector, and the begin and end index of its argument values in the values
vector:
struct Function {
std::string name;
bool declaration = false;
bool local_only = false;
bool has_call = false;
u32 block_begin_idx = 0, block_end_idx = 0;
u32 arg_begin_idx = 0, arg_end_idx = 0;
};
A block has a name, the range of its successor blocks, the range of its instruction and the range of its phi nodes as well as the block storage required by the framework:
struct Block {
std::string name;
u32 succ_begin_idx = 0, succ_end_idx = 0;
u32 inst_begin_idx = 0, phi_end_idx = 0, inst_end_idx = 0;
u32 block_info = 0, block_info2 = 0;
};
Finally, a value has a name, its instruction kind/type, its operation kind, whether it should be assigned a fixed register, and references to its operands.
struct Value {
enum class Type : u8 {
normal,
arg,
phi,
terminator,
};
enum class Op : u8 {
none,
any,
add,
sub,
alloca,
terminate,
ret,
br,
condbr,
tbz,
jump,
call,
zerofill,
};
std::string name;
Type type;
Op op = Op::none;
bool force_fixed_assignment = false;
u32 call_func_idx;
u32 op_count;
u32 op_begin_idx, op_end_idx;
};
Adaptor
We will first define the reference types used: IRFuncRef
, IRBlockRef
, IRInstRef
and IRValueRef
. Since all of them are in different lists, it is natural to use their indices as the reference type. We will use u32
for this but in your case it might make more sense to use enum class
types to have some type safety.
struct TestIRAdaptor {
using IRFuncRef = u32;
using IRBlockRef = u32;
using IRInstRef = u32;
using IRValueRef = u32;
static constexpr IRFuncRef INVALID_FUNC_REF = u32(-1);
static constexpr IRBlockRef INVALID_BLOCK_REF = u32(-1);
static constexpr IRValueRef INVALID_VALUE_REF = u32(-1);
Then we define some configuration options. The adaptor can provide the highest local index a value can have since we will use the value index as its local index and arguments are not included in the normal instruction stream so the liveness analysis will have to visit them explicitly.
static constexpr bool TPDE_PROVIDES_HIGHEST_VAL_IDX = true;
static constexpr bool TPDE_LIVENESS_VISIT_ARGS = true;
Now we can start implementing the required functions. To access the IR, the adaptor will have a pointer to the IR structure. First, providing access to the list of functions:
TestIR *ir;
u32 func_count() const noexcept {
return u32(ir->functions.size());
}
auto funcs() const noexcept {
return std::views::iota(0u, u32(ir->functions.size()));
}
auto funcs_to_compile() const noexcept {
return funcs() | std::views::filter([ir](u32 idx) { return !ir->functions[idx].declaration; });
}
Then we add general function information
std::string_view func_link_name(IRFuncRef func) const noexcept {
return ir->functions[func].name;
}
bool func_extern(IRFuncRef func) const noexcept {
return ir->functions[func].declaration;
}
bool func_only_local(IRFuncRef func) const noexcept {
return ir->functions[func].local_only;
}
static bool func_has_weak_linkage(IRFuncRef) noexcept {
return false;
}
Then we can add information about the current function. For this we will add a member cur_func
which will hold the currently selected function which we change when switch_func
is called. To return the highest local value index, which we said the adaptor provides, we will also add a member highest_local_val_idx
which is calculated in switch_func
.
u32 cur_func = INVALID_FUNC_REF;
u32 highest_local_val_idx = 0;
static bool cur_needs_unwind_info() noexcept {
return false;
}
static bool cur_is_vararg() noexcept {
return false;
}
u32 cur_highest_val_idx() const noexcept {
return highest_local_val_idx;
}
auto cur_args() const noexcept {
const Function& func = ir->functions[cur_func];
return std::iota(func.arg_begin_idx, func.arg_end_idx);
}
static bool cur_arg_is_byval(u32) noexcept { return false; }
static u32 cur_arg_byval_align(u32) noexcept { return 0; }
static u32 cur_arg_byval_size(u32) noexcept { return 0; }
static bool cur_arg_is_sret(u32) noexcept { return false; }
auto cur_static_allocas() const noexcept {
const auto &block = ir->blocks[ir->functions[cur_func].block_begin_idx];
return std::views::iota(block.inst_begin_idx, block.inst_end_idx) |
std::views::filter([ir](u32 val) {
return ir->values[val].op == TestIR::Value::Op::alloca;
});
}
static bool cur_has_dynamic_alloca() noexcept {
return false;
}
IRBlockRef cur_entry_block() const noexcept {
return ir->functions[cur_func].block_begin_idx;
}
auto cur_blocks() const noexcept {
const auto &func = ir->functions[cur_func];
return std::views::iota(func.block_begin_idx, func.block_end_idx);
}
Then we can provide the information about blocks which are mostly straightforward ranges manipulations:
auto block_succs(IRBlockRef block) const noexcept {
const auto &info = ir->blocks[block];
const auto *data = ir->value_operands.data();
return std::ranges::subrange(data + info.succ_begin_idx,
data + info.succ_end_idx);
}
auto block_insts(IRBlockRef block) const noexcept {
const auto &info = ir->blocks[block];
return std::views::iota(info.phi_end_idx, info.inst_end_idx);
}
auto block_phis(IRBlockRef block) const noexcept {
const auto &info = ir->blocks[block];
return std::views::iota(info.inst_begin_idx, info.phi_end_idx);
}
u32 block_info(IRBlockRef block) const noexcept {
return ir->blocks[block].block_info;
}
void block_set_info(IRBlockRef block, u32 info) noexcept {
ir->blocks[block].block_info = info;
}
u32 block_info2(IRBlockRef block) const noexcept {
return ir->blocks[block].block_info2;
}
void block_set_info2(IRBlockRef block, u32 info) noexcept {
ir->blocks[block].block_info2 = info;
}
std::string_view block_fmt_ref(IRBlockRef block) const noexcept {
return ir->blocks[block].name;
}
Now follows the information about values. First is the local index for values. Since it should be dense and start at zero, we will use the index in the values
vector but subtract the index of the first value in the function from it.
u32 val_local_idx(IRValueRef val) const noexcept {
return val - ir->functions[cur_func].arg_begin_index;
}
bool val_ignore_in_liveness_analysis(IRValueRef val) const noexcept {
return ir->values[val].op == TestIR::Value::Op::alloca;
}
Then is information about PHIs. For this we need to create a helper class that keeps enough information to return information about the number of incoming values and their associated basic blocks.
bool val_is_phi(IRValueRef val) const noexcept {
return ir->values[val].type == TestIR::Value::Type::phi;
}
struct PHIRef {
u32* op_begin, *block_begin;
u32 incoming_count() const noexcept {
return u32(block_begin - op_begin);
}
IRValueRef incoming_val_for_slot(u32 slot) const noexcept {
return op_begin[slot];
}
IRBlockRef incoming_block_for_slot(u32 slot) const noexcept {
return block_begin[slot];
}
IRValueRef incoming_val_for_block(IRBlockRef block) const noexcept {
for (auto *op = op_begin; op < block_begin; ++op) {
if (block_begin[op - op_begin] == block) {
return static_cast<IRValueRef>(*op);
}
}
return INVALID_VALUE_REF;
}
};
PHIRef val_as_phi(IRValueRef val) const noexcept {
const auto &info = ir->values[val_idx];
const auto *data = ir->value_operands.data();
return PHIRef{data + info.op_begin_idx,
data + info.op_begin_idx + info.op_count};
}
Then there's only information about allocas left.
u32 val_alloca_size(IRValueRef val) const noexcept {
const auto *data = ir->value_operands.data();
return data[ir->values[val].op_begin_idx];
}
u32 val_alloca_align(IRValueRef val) const noexcept {
const auto *data = ir->value_operands.data();
return data[ir->values[val].op_begin_idx + 1];
}
std::string_view value_fmt_ref(IRValueRef val) const noexcept {
return ir->values[val].name;
}
Closing in, there's only a bit of information about instruction operands and results left. We only need to check whether an instruction actually produces a result.
auto inst_operands(IRInstRef inst) const noexcept {
const auto &info = ir->values[inst];
const auto *data = ir->value_operands.data();
return std::ranges::subrange(data + info.op_begin_idx,
data + info.op_begin_idx + info.op_count);
}
auto inst_results(IRInstRef inst) const noexcept {
const auto &info = ir->values[inst];
bool is_def = false;
switch (ir->values[inst].op) {
using enum TestIR::Value::Op;
case any:
case add:
case sub:
case alloca:
case call:
case zerofill: us_def = true; break;
default: break;
}
return std::views::single(inst) | std::views::drop(!is_def);
}
static bool inst_fused(IRInstRef) noexcept {
return false;
}
std::string_view inst_fmt_ref(IRInstRef inst) const noexcept {
return ir->values[inst].name;
}
Finally, we only need to implement the logic that calculates the highest local value index when the compiler switches to a new function
static void start_compile() {}
static void end_compile() {}
bool switch_func(IRFuncRef func) noexcept {
cur_func = func;
highest_local_val_idx = 0;
const auto &info = ir->functions[cur_func];
highest_local_val_idx =
ir->blocks[info.block_end_idx - 1].inst_end_idx - info.arg_begin_idx;
if (highest_local_val_idx > 0) {
--highest_local_val_idx;
}
return true;
}
void reset() {
cur_func = INVALID_FUNC_REF;
highest_local_val_idx = 0;
}
};
Basic Compiler structure
We will divide the compiler into two classes: One for architecture-independent/-spanning functionality and one for x86-64 specific details. Later we will also add an implementation for AArch64. These will live in two different files:
#include <tpde/CompilerBase.hpp>
#include "TestIRAdaptor.hpp"
template<typename Adaptor, typename Derived, typename Config>
TestIRCompilerBase(TestIRAdaptor *adaptor) : Base{adaptor} {
static_assert(tpde::Compiler<Derived, Config>);
static_assert(std::is_same_v<Adaptor, TestIRAdaptor>);
}
Derived *
derived() noexcept {
return static_cast<Derived *
>(
this); }
const Derived *
derived() const noexcept {
return static_cast<Derived *>(this);
}
const TestIR* ir() const noexcept {
this->adaptor->ir;
}
}
#include <tpde/x64/CompilerX64.hpp>
#include "TestIRCompilerBase.hpp"
struct CompilerConfig : tpde::x64::PlatformConfig {};
struct TestIRCompilerX64 : tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig> {
using Base = tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
explicit TestIRCompilerX64(TestIRAdaptor *adaptor)
: Base{adaptor} {
static_assert(tpde::Compiler<TestIRCompilerX64, tpde::x64::PlatformConfig>);
}
};
Note that the x86-64 implementation is in a cpp file as, currently, we cannot have implementations for two different architectures compiled in the same translation unit at the same time.
Required functions
Base class
Since we only want to compile for x86-64 and AArch64, we can implement a few functions in an architecture-independent manner.
struct TestIRCompilerBase
bool cur_func_may_emit_calls() {
this->ir()->functions[this->adaptor->cur_func].has_call;
}
static typename CompilerConfig::Assembler::SymRef cur_personality_func() {
return {};
}
bool try_force_fixed_assignment(IRValueRef value) const noexcept {
return this->ir()->values[value].force_fixed_assignment;
}
struct ValueParts {
static u32 count() { return 1; }
static u32 size_bytes(u32 ) { return 8; }
static tpde::RegBank reg_bank(u32 ) {
return CompilerConfig::GP_BANK;
}
};
static ValueParts val_parts(IRValueRef) { return ValueParts{}; }
static std::optional<ValRefSpecial> val_ref_special(IRValueRef) {
return {};
}
ValuePartRef val_part_ref_special(ValRefSpecial&, u32 ) noexcept {
TPDE_UNREACHABLE("val_part_ref_special called when IR does not have special values");
}
static void define_func_idx(IRFuncRef func, u32 idx) {
assert(func == idx);
}
bool compile_inst(IRInstRef inst, InstRange remaining) noexcept {
const TestIR::Value &value = this->analyzer.adaptor->ir->values[inst_idx];
assert(value.type == TestIR::Value::Type::normal ||
value.type == TestIR::Value::Type::terminator);
switch (value.op) {
using enum TestIR::Value::Op;
case alloca:
return true;
case add: return derived()->compile_add(inst);
case sub: return derived()->compile_sub(inst);
case ret: return this->compile_ret(inst);
case terminate: return this->compile_terminate();
case br: return derived()->compile_br(inst);
case condbr: return derived()->compile_condbr(inst);
case tbz: return derived()->compile_tbz(inst);
case call: return derived()->compile_call(inst);
default: TPDE_LOG_ERR("encountered unimplemented instruction"); return false;
}
}
Note that for two instructions, ret
and terminate
, we do not call into the derived class. This is because their logic does not actually require emitting any architecture-specific instructions ourselves.
bool compile_terminate() noexcept {
derived()->gen_func_epilog();
this->release_regs_after_return();
return true;
}
bool compile_ret(IRInstRef inst) noexcept {
const TestIR::Value &value = this->analyzer.adaptor->ir->values[inst_idx];
IRValueRef ret_op = this->ir()->value_operands[value.op_begin_idx];
RetBuilder rb{this, *this->cur_cc_assigner()};
rb.add(ret_op);
rb.ret();
return true;
}
};
x86-64 class
In TestIRCompilerX64
, currently no function is required by the framework.
struct TestIRCompilerX64
};
As for the functions referenced by the base class, we can simply stub them out with implementations which return false for now.
Compiling to an object file
To compile to an object file, we simply need to construct an adaptor and a compiler and then let them compile.
TestIR* ir = ;
TestIRAdaptor adaptor{ir};
TestIRCompilerX64 compiler{&adaptor};
if (!compiler.compile()) {
TPDE_LOG_ERR("Failed to compile IR");
return 1;
}
std::vector<u8> data = compiler.assembler.build_object_file();
std::ofstream out_file{obj_out_path.Get(), std::ios::binary};
out_file.write(reinterpret_cast<const char *>(data.data()), data.size());
If you are interested in the full code you can check out tpde/src/test/test_main.cpp
in the source tree.
Instruction selection
We have already implemented instruction selection for returns, so we can already compile simple functions like this:
simple_ret(%a, %b) {
entry:
; X64: sub rsp,0x30
; X64-NEXT: mov rax,rsi
; X64: add rsp,0x30
ret %b
}
which will give us an object file with this code:
simple_ret:
push rbp
mov rbp, rsp
sub rsp, 0x30
nop word ptr [rax + rax]
mov rax, rsi
add rsp, 0x30
pop rbp
ret
We now can move on to simple operations, branches and calls.
Simple operations
The IR only has add
and sub
as simple operations which we can implement in a rather straightforward manner following the guide in the Compiler Reference
bool compile_add(IRInstRef inst) {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_idx);
auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_idx);
auto [res_ref, res_part] = this->result_ref_single(inst);
AsmReg lhs_reg = lhs_part.load_to_reg();
AsmReg rhs_reg = rhs_part.load_to_reg();
AsmReg res_reg = res_part.alloc_try_reuse(lhs_part);
if (res_reg == lhs_reg) {
ASM(ADD64rr, res_reg, rhs_reg);
} else {
ASM(LEA64rm, res_reg, FE_MEM(lhs_reg, 1, rhs_reg, 0));
}
res.set_modified();
return true;
}
bool compile_sub(IRInstRef inst) {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_idx);
auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_idx);
auto [res_ref, res_part] = this->result_ref_single(inst);
ValuePartRef tmp_part = lhs_part.into_temporary();
AsmReg lhs_reg = tmp_part.cur_reg();
AsmReg rhs_reg = rhs_part.load_to_reg();
ASM(SUB64rr, lhs_reg, rhs_reg);
res_part.set_value(std::move(tmp_part));
return true;
}
Unconditional branches
Unconditional branches are also straightforward to follow from the Compiler Reference: Simply grab the block reference and ask the framework to do an unconditional jump.
bool compile_br(IRInstRef inst) noexcept {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
IRBlockRef target = this->ir()->value_operands[value.op_begin_idx];
const auto spilled = this->spill_before_branch();
this->begin_branch_region();
this->generate_branch_to_block(Jump::jmp, target, false, true);
this->end_branch_region();
this->release_spilled_regs(spilled);
return true;
}
Conditional branches
Conditional branches require us to do the comparisons (as they are built into the branch) and then select the suitable branch condition. Otherwise we can copy from the reference again.
void generate_condbr(tpde::x64::CompilerX64::Jump cc,
IRBlockRef true_target, IRBlockRef false_target) noexcept {
bool true_needs_split = this->branch_needs_split(true_target);
bool false_needs_split = this->branch_needs_split(false_target);
const IRBlockRef next_block = this->analyzer.block_ref(this->next_block());
const auto spilled = this->spill_before_branch();
this->begin_branch_region();
if (next_block == true_target || (next_block != false_target && true_needs_split)) {
this->generate_branch_to_block(this->invert_jump(cc), false_target, false_needs_split, false);
this->generate_branch_to_block(Jump::jmp, true_target, false, true);
} else {
this->generate_branch_to_block(cc, true_target, true_needs_split, false);
this->generate_branch_to_block(Jump::jmp, false_target, false, true);
}
this->end_branch_region();
this->release_spilled_regs(spilled);
}
bool compile_condbr(IRInstRef inst) noexcept {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
IRValueRef cond_val = this->ir()->value_operands[value.op_begin_idx];
IRBlockRef true_target = this->ir()->value_operands[value.op_begin_idx + 1];
IRBlockRef false_target = this->ir()->value_operands[value.op_begin_idx + 2];
auto [_, cond_part] = this->val_ref_single(cond_val);
auto cond_reg = cond_part.load_to_reg();
ASM(CMP64ri, cond_reg, 0);
this->generate_condbr(Jump::jne, true_target, false_target);
return true;
}
bool compile_tbz(IRInstRef inst) noexcept {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
IRValueRef cond_val = this->ir()->value_operands[value.op_begin_idx];
IRBlockRef true_target = this->ir()->value_operands[value.op_begin_idx + 1];
IRBlockRef false_target = this->ir()->value_operands[value.op_begin_idx + 2];
u32 bit_idx = this->ir()->value_operands[value.op_begin_idx + 3];
auto [_, cond_part] = this->val_ref_single(cond_val);
auto cond_reg = cond_part.load_to_reg();
Jump cc;
if (bit_idx < 31) {
cc = Jump::jne;
ASM(TEST64ri, cond_reg, (1u << bit_idx));
} else {
cc = Jump::jb;
ASM(BT64ri, cond_reg, bit_idx);
}
this->generate_condbr(cc, true_target, false_target);
return true;
}
Calls
Now the only major feature left is calls. Since our IR does not do sign-/zero-extension or other fancy stuff with the arguments we can simply reuse much of the code from the reference.
bool compile_call(IRInstRef inst) noexcept {
const auto func_idx = value.call_func_idx;
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
auto operands = std::span<IRValueRef>{(this->ir()->value_operands.data() + value.op_begin_idx),
value.op_count};
auto [_, res_part] = this->result_ref_single(inst);
util::SmallVector<CallArg, 8> arguments{};
for (IRValueRef op : operands) {
arguments.push_back(CallArg{op});
}
this->generate_call(this->func_syms[func_idx],
arguments,
std::span{static_cast<ValuePart *>(&res_part), 1},
false);
return true;
}
This concludes everything we need to do to have a basic x86-64 implementation of a compiler back-end for our IR.
Using EncodeGen
To not have to implement instruction selection for everything ourselves, we're going to start using the EncodeGen tool TPDE provides to implement add
/sub
.
We will first have to add a file with snippets, generate a file containing the snippet encoders, adding them to the compiler and then calling them.
Snippet File
Since our IR is simple, the snippets will also be very simple:
unsigned long addi64(unsigned long a, unsigned long b) { return a + b; }
unsigned long subi64(unsigned long a, unsigned long b) { return a - b; }
Generating Snippet Encoders
To generate snippet encoders for x86-64, we will use the following invocations of clang
and tpde-encodegen
following the reference:
clang -c -emit-llvm -ffreestanding -fcf-protection=none -O3 -fomit-frame-pointer -fno-math-errno --target=x86_64 -march=x86-64-v4 -o snippets_x64.bc snippets.c
tpde_encodegen -o snippet_encoders_x64.hpp snippets_x64.bc
Add them to the compiler
Following the reference again, we will add the EncodeCompiler
defined in snippet_encoders_x64.hpp
to TestIRCompilerX64
:
#include "snippet_encoders_x64.hpp"
struct TestIRCompilerX64 : tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>,
tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig> {
using Base = tpde::x64::CompilerX64<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
using EncCompiler = tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
explicit TestIRCompilerX64(TestIRAdaptor *adaptor)
: Base{adaptor} {
static_assert(tpde::Compiler<TestIRCompilerX64, tpde::x64::PlatformConfig>);
}
void reset() noexcept {
Base::reset();
EncCompiler::reset();
}
};
Replacing architecture-dependent code
Now we can simply replace our implementations for compile_add
/compile_sub
in TestIRCompilerX64
and move them to TestIRCompilerBase
.
bool compile_add(IRInstRef inst) {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
ValueRef lhs_ref = this->val_ref(lhs_idx);
ValueRef rhs_ref = this->val_ref(lhs_idx);
ValueRef res_ref = this->result_ref(inst);
ScratchReg res_scratch{this};
if (!derived()->encode_addi64(lhs_ref.part(0), rhs_ref.part(0), res_scratch)) {
return false;
}
this->set_value(res_ref.part(0), res_scratch);
return true;
}
bool compile_sub(IRInstRef inst) {
const TestIR::Value &value = ir()->values[static_cast<u32>(inst_idx)];
const auto lhs_idx = this->ir()->value_operands[value.op_begin_idx];
const auto rhs_idx = this->ir()->value_operands[value.op_begin_idx + 1];
ValueRef lhs_ref = this->val_ref(lhs_idx);
ValueRef rhs_ref = this->val_ref(lhs_idx);
ValueRef res_ref = this->result_ref(inst);
ScratchReg res_scratch{this};
if (!derived()->encode_subi64(lhs_ref.part(0), rhs_ref.part(0), res_scratch)) {
return false;
}
this->set_value(res_ref.part(0), res_scratch);
return true;
}
Note that you would probably want to combine both of these into one function that simply switches on the binary operation.
Porting to AArch64
To port the compiler, we need to create a new cpp file with a TestIRCompilerA64
and implement its required functionality and instructions that EncodeGen currently cannot generate which includes branches to other basic blocks and calls. Therefore, we would need to implement call
, br
, condbr
and tbz
again.
First, we generate snippet encoders for AArch64:
clang -c -emit-llvm -ffreestanding -fcf-protection=none -O3 -fomit-frame-pointer -fno-math-errno --target=aarch64 -march=armv8.1-a -o snippets_a64.bc
tpde_encodegen -o snippet_encoders_a64.hpp snippets_a64.bc
And then we can create the compiler:
#include <tpde/a64/CompilerA64.hpp>
#include "snippet_encoders_a64.hpp"
struct TestIRCompilerA64 : tpde::a64::CompilerA64<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig>,
tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig> {
using Base = tpde::a64::CompilerA64<TestIRAdaptor, TestIRCompilerA64, TestIRCompilerBase, CompilerConfig>;
using EncCompiler = tpde_encodegen::EncodeCompiler<TestIRAdaptor, TestIRCompilerX64, TestIRCompilerBase, CompilerConfig>;
explicit TestIRCompilerA64(TestIRAdaptor *adaptor)
: Base{adaptor} {
static_assert(tpde::Compiler<TestIRCompilerA64, tpde::a64::PlatformConfig>);
}
void reset() noexcept {
Base::reset();
EncCompiler::reset();
}
static tpde::a64::CallingConv cur_calling_convention() {
return tpde::a64::CallingConv::SYSV_CC;
}
};
We will omit the implementations for the architecture-specific instructions since they are very similar to the implementations provided above.
Now we only need to change the implementation to compile the IR for both architectures. As we cannot include both of the compilers in the same translation unit due to macro clashes we will need to provide implementations to compile the IR in both TUs with an interface that doesn't require including TPDE headers.
std::optional<std::vector<u8>> compile_ir_x64(TestIR* ir) {
TestIRAdaptor adaptor{ir};
TestIRCompilerX64 compiler{&adaptor};
if (!compiler.compile()) {
return {};
}
return compiler.assembler.build_object_file();
}
std::optional<std::vector<u8>> compile_ir_a64(TestIR* ir) {
TestIRAdaptor adaptor{ir};
TestIRCompilerA64 compiler{&adaptor};
if (!compiler.compile()) {
return {};
}
return compiler.assembler.build_object_file();
}
and change the main compilation code accordingly
TestIR* ir = ;
std::optional<std::vector<u8>> data{};
if (arch.Get() == Arch::X64) {
data = compile_ir_x64(ir);
} else {
data = compile_ir_a64(ir);
}
if (!data) {
TPDE_LOG_ERR("Failed to compile IR");
return 1;
}
std::ofstream out_file{obj_out_path.Get(), std::ios::binary};
out_file.write(reinterpret_cast<const char *>(data->data()), data->size());