This is the reference for the CompilerBase
and related classes. It is the main driver of compilation and the class you as the user will in the end create an instance of to start compilation.
The reference starts by explaining the basic class hierarchy used in the compiler, ...
Class Hierarchy
- Compiler uses static polymorphism, i.e. templates using CRTP to adapt the compiler to your IR and use your provided functions
- CompilerBase takes three template parameters
- Adaptor class type
- final derived type
- config for Assembler type, some flags and architecture-specific information
- architecture-specific Compiler class (currently x64/arm64) which take 4 template parameters
- Adaptor class type
- final derived type
- base compiler type (can be CompilerBase or a user-provided base class that inherits from CompilerBase that can be used for architecture-independent functionality)
- config for CompilerBase config and some architecture-specific config
- final compiler class which inherits from the architecture-specific compiler class and provides the template types
- many funcs are called in the base classes using the derived class giving you the option to override or inject custom behavior
┌─────────┐ ┌────────────┐ ┌─────────┐
│IRAdaptor◄─ ─ ─ ─ ─┤CompilerBase├─ ─ ─ ─ ─►Assembler│
└─────────┘ └─────▲──────┘ └─────────┘
│
│
┌───────┴────────┐
│UserCompilerBase│
└───────▲────────┘
│
│
┌──────┴───────┐
│Compiler{Arch}│
└──────▲───────┘
│
│
┌────────┴─────────┐
│UserCompiler{Arch}│
└──────────────────┘
Config
- type which provides typedefs and constants used to configure the compiler
- mostly internal or currently not useful (defined in CompilerConfig.hpp/CompilerX64.hpp/CompilerA64.hpp)
- e.g. provides typedef
Assembler
which tells the compiler which assembler to use, but currently only one assembler supported for each architecture
- most interesting option is
DEFAULT_VAR_REF_HANDLING
, which, when turned off, allows to setup custom variable refs, e.g. for globals (see later).
Required functions to implement
General
cur_func_may_emit_calls
bool cur_func_may_emit_calls() const noexcept;
Can the compilation of any instruction in the current function emit calls? This may be used to optimize register allocation or stack frame setup. If you can cheaply provide the answer, you should. Otherwise you currently should always return true
.
cur_personality_func
typename CompilerConfig::Assembler::SymRef cur_personality_func() noexcept;
Returns the personality function of the current function. This is relevant for exception handling and cleanup actions performed on unwinding.
try_force_fixed_assignment
bool try_force_fixed_assignment(IRValueRef val) const noexcept;
Should the register allocator try to allocate a fixed register for val
even if its heuristic tells it not to. Mostly useful for debugging, you should return false
.
val_parts
struct ValParts {
u32 count() const noexcept;
u32 size_bytes(u32 part_idx) const noexcept;
RegBank reg_bank(u32 part_idx) const noexcept;
};
ValParts val_parts(IRValueRef val) const noexcept;
This functions should return an object that provides information about how a value is divided into parts. Each part corresponds to a register the value should be stored in. For each value, you need to provide the number of parts and for each part its size in bytes and register bank. The RegBank type is defined in RegisterFile.hpp. The possible banks are defined in the PlatformConfigs provided by the architecture-specific Compiler classes.
val_ref_special
using ValRefSpecial = ;
std::optional<ValRefSpecial> val_ref_special(IRValueRef val) noexcept;
With this function you have the possibility to return custom ValueRef
s when you or the compiler calls value_ref
with val
. This is useful to implement constants that should not be materialized in registers if not needed or otherwise do not need an assignment. There is a default ValRefSpecial
defined in CompilerBase
but you can optionally override this typedef if you need to store additional information for when individual parts of that value are accessed.
More about this later when explaining how to implement constants.
[!warning] The ValRefSpecial
type is used in a union so it must be a standard-layout struct and have u8 mode
as its first member which has to have a value greater than 3.
val_part_ref_special
ValuePartRef val_part_ref_special(ValRefSpecial& val_ref_special, u32 part_idx) noexcept;
When you return a ValRefSpecial
in val_ref_special
and a part of it is accessed, this function is called and you need to provide a ValuePartRef
for it.
More details about what ValuePartRef
s can be found below.
define_func_idx
void define_func_idx(IRFuncRef func, u32 idx) noexcept;
Internally, the compiler numbers functions and this callback gives you the chance to save that number if you need it later, e.g. to access the created function symbols. This function is optional to implement.
setup_var_ref_assignments
void setup_var_ref_assignments() noexcept;
This function is called when you set DEFAULT_VAR_REF_HANDLING
to false
in the config, asking you to setup assignments for values which are marked to be "variable references". The details of this are explained further down below.
load_address_of_var_reference
void load_address_of_var_reference(AsmReg dst, AssignmentPartRef ap) noexcept;
This function is called whenever the ValuePartRef
of a value which is marked as being a custom variable reference is needed in a register. This function is only needed when DEFAULT_VAR_REF_HANDLING
is false
. The details of this mechanism are explained further down below.
cur_cc_assigner
CCAssigner* cur_cc_assigner() noexcept;
This function provides a calling convention assigner for the current function. It is optional to implement but can be used if different calling conventions than the default C calling convention should be used. Note that the pointer returned is non-owning so you need to allocate storage that is live throughout the function for the assigner.
compile_inst
bool compile_inst(IRInstRef inst, InstRange remaining_instructions) noexcept;
This function is called when the compiler iterates through blocks in the code generation pass and asks you to generate machine code for an instruction and update the value assignments for the result values of the instruction. The return value indicates whether compilation of the instruction failed.
How to implement this function is explained in the next section
How to compile instructions
very simple
- simple example: instruction without any operands, no result, no control flow
- each architecture-compiler provides
ASM*
macros to append instructions to the current function
- use assembler libraries fadec and disarm for encoding, instruction name for the macro derived from the naming in the libraries
bool compile_first_inst() {
ASM(AND32ri, AsmReg::AX, 0xFFFFFFFF);
return true;
}
- registers given to the encoders do not encode their size, this is given in the instruction name
defining results
- suppose we have instruction that simply returns a zero value
- have to get result ValueRef
- ValueRef is a RAII wrapper that manages the refcount for a value
- analyzer counts definition and each use (so you will have to retrieve a
ValueRef
for each definition or use as an operand)
- on destruction
ValueRef
will decrement refcount and free associated resources if the value becomes unused
- allows some manual refcounting if necessary (e.g. for fusing)
- when defining a value we use the result_ref function to get the ValueRef
- for each part that we use we can get a ValuePartRef using part
- allows to allocate a register when defining a value using alloc_reg
[!note] this locks the register for the ValuePartRef
meaning it cannot be evicted until the ValuePartRef
is destructed or you call unlock
- when value defined, you have to call set_modified to tell the framework that the value needs to be spilled if the register is reused bool compile_zero_inst(IRInstRef inst) {
IRValueRef res_val = ;
ValueRef res_ref = this->result_ref(res_val);
ValuePartRef res_part = res_ref.part(0);
AsmReg res_reg = res_part.alloc_reg();
ASM(XOR32rr, res_reg, res_reg);
res_part.set_modified();
return true;
}
- if value has multiple parts to zero, e.g. 128 bit int on 64 bit arch, simply get multiple parts
bool compile_zero_inst128(IRInstRef inst) {
IRValueRef res_val = ;
ValueRef res_ref = this->result_ref(res_val);
ValuePartRef res_part = res_ref.part(0);
AsmReg res_reg = res_part.alloc_reg();
ASM(XOR32rr, res_reg, res_reg);
res_part.set_modified();
res_part = res_ref.part(1);
res_reg = res_part.alloc_reg();
ASM(XOR32rr, res_reg, res_reg);
res_part.set_modified();
return true;
}
- in case you only need part 0, you can get the
ValueRef
and ValuePartRef
in an std::pair
using result_ref_single auto [res_ref, res_part] = this->result_ref_single(res_val);
using operands
- assume we have a 64 bit add
- need to get lhs and rhs operand
- similar to result: get ValueRef and then the necessary ValuePartRefs
- when using as operand, use val_ref function, or val_ref_single
- to get register, use load_to_reg which will also lock the register and reload the value from the stack if necessary
bool compile_add64(IRInstRef inst) {
IRValueRef lhs_val = ;
IRValueRef rhs_val = ;
IRValueRef res_val = ;
auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_val);
auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_val);
auto [res_ref, res_part] = this->result_ref_single(res_val);
AsmReg lhs_reg = lhs_part.load_to_reg();
AsmReg rhs_reg = rhs_part.load_to_reg();
AsmReg res_reg = res_part.alloc_reg();
ASM(MOV64rr, res_reg, lhs_reg);
ASM(ADD64rr, res_reg, rhs_reg);
res_part.set_modified();
return true;
}
reusing operand registers
- previouse encoding inefficient
- if lhs is not used after the instruction, we could reuse register ("salvage" the register)
- to do this if possible we will use the into_temporary function
- will give use an owning reference to the register if the value is dead after this instruction or make a copy into a new register
bool compile_add64(IRInstRef inst) {
IRValueRef lhs_val = ;
IRValueRef rhs_val = ;
IRValueRef res_val = ;
auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_val);
auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_val);
auto [res_ref, res_part] = this->result_ref_single(res_val);
ValuePartRef tmp_part = lhs_part.into_temporary();
AsmReg lhs_reg = tmp_part.cur_reg();
AsmReg rhs_reg = rhs_part.load_to_reg();
ASM(ADD64rr, lhs_reg, rhs_reg);
res_part.set_value(std::move(tmp_part));
return true;
}
- note that in this case we do not need to get the result
ValueRef
before encoding the function and can transform it into a one-liner: this->result_ref(res_val).part(0).set_value(std::move(tmp_part))
- to salvage either the lhs or rhs we can check whether we can actually salvage the register using can_salvage
- then populate the ValuePartRef using either
lhs_part
or rhs_part
temporary registers
- sometimes temporary registers required for computing result and using
into_temporary
is not sufficient
- allocated and free'd using ScratchReg
- allocate using alloc, alloc_gp or a specific register using alloc_specific
- use the
AsmReg
as desired
- register is free'd on desctruction of the
ScratchReg
or when calling reset
Constants using ValRefSpecial
- constants relatively common feature
- they might not have a local index assigned by the adaptor
- might want to handle them just as normal values for code that does not need to care
- every time
val_ref
is called it will call val_ref_special
in your compiler
- check if the value is a constant and then return a
ValRefSpecial
- suppose we use the default ValRefSpecial from CompilerBase since we only support 64 bit constants
- then implement
val_ref_special
like this std::optional<ValRefSpecial> val_ref_special(IRValueRef value) noexcept {
if () {
return std::nullopt;
}
u64 constant = ;
return ValRefSpecial{.mode = 4, .const_data = constant};
}
- since constants are only 64 bit, only part 0 will be accessed for x86-64
- in
val_part_ref_special
we can then return a constant ValuePartRef
which will handle materialization for us ValuePartRef val_part_ref_special(ValRefSpecial& val_ref, u32 part_idx) noexcept {
assert(part_idx == 0);
return ValuePartRef{val_ref.const_data, 8, Config::GP_BANK};
- for constants larger than 64 bits, you will need to allocate them somewhere and then give the
ValuePartRef
a pointer to the constant data (e.g. for vector constants)
- if you want to store more information about constants, you will need to define your own
ValRefSpecial
struct
- LLVM implementation can be used as a reference
optimizing constant operands
- might want to optimize instruction selection if an operand is a constant since many instructions can take a register or immediate operand
- doable by checking if a part is constant using is_const
- then access the constant data using const_data which returns a
std::span<u64>
bool compile_add64(IRInstRef inst) {
IRValueRef lhs_val = ;
IRValueRef rhs_val = ;
IRValueRef res_val = ;
auto [lhs_ref, lhs_part] = this->val_ref_single(lhs_val);
auto [rhs_ref, rhs_part] = this->val_ref_single(rhs_val);
auto [res_ref, res_part] = this->result_ref_single(res_val);
ValuePartRef tmp_part = lhs_part.into_temporary();
AsmReg lhs_reg = tmp_part.cur_reg();
if (rhs_part.is_const()
&& i64(rhs_part.const_data()[0]) == i64(i32(rhs_part.const_data()[0]))
) {
ASM(ADD64ri, lhs_reg, rhs_part.const_data()[0]);
} else {
AsmReg rhs_reg = rhs_part.load_to_reg();
ASM(ADD64rr, lhs_reg, rhs_reg);
}
res_part.set_value(std::move(tmp_part));
return true;
}
- can also optimize if lhs is const and rhs is not
- before calling
into_temporary
if (lhs_part.is_const() && !rhs_part.is_const()) {
std::swap(lhs_ref, rhs_ref);
std::swap(lhs_part, rhs_part);
}
Materializing constants
- sometimes need constant that is not a value for computation in register
- materializing constants might be tedious (AArch64)
- simple helper to do that if you have allocated a
ScratchReg
ScratchReg scratch{this};
AsmReg tmp_reg = scratch.alloc_gp();
u64 constant = ;
this->materialize_constant(constant, Config::GP_BANK, 8, tmp_reg);
Fusing instructions
- sometimes it is beneficial to fuse adjacent IR instructions, e.g. load with zero extend
- TPDE only supports fusing instructions which have not been compiled yet (forward fusing)
- adaptor needs to implement bookkeeping for which instructions have been fused
compile_inst
gets an InstRange
as a parameter which can be used to iterate over the following instructions
- check for the specific instruction we want to match, then mark it as fused
- example: load i8 with sx
bool compile_loadi8(IRInstRef inst, InstRange remaining) {
IRValueRef ptr_val = ;
IRValueRef res_val = ;
auto [ptr_ref, ptr_part] = this->val_ref_single(ptr_val)
AsmReg ptr_reg = ptr_part.load_to_reg();
if (remaining.from != remaining.to
&& analyzer.liveness_info(u32(this->adaptor->val_local_idx(res_val))).ref_count <= 2
&&
) {
res_val = ;
auto [res_ref, res_part] = this->result_ref_single(res_val);
AsmReg res_reg = res_part.alloc_reg();
ASM(MOVSXr64m8, res_reg, FE_MEM( ptr_reg, 0, FE_NOREG, 0));
res_part.set_modified();
} else {
res_val = ;
auto [res_ref, res_part] = this->result_ref_single(res_val);
AsmReg res_reg = res_part.alloc_reg();
ASM(MOVZXr32m8, res_reg, FE_MEM( ptr_reg, 0, FE_NOREG, 0));
res_part.set_modified();
}
return true;
}
Return
- use RetBuilder to assign returned ValueParts or IRValueRefs to their corresponding return registers
- ValueParts may be used if you need to do some processing before returning them, e.g. zero-/sign-extension with non-power-of-two bitwidths
- afterwards, call RetBuilder::ret to generate the return
- this will internally move the results into the return registers and then call
gen_func_epilog
and release_regs_after_return
- if you do not need to return anything, you can skipp the RetBuilder and call these two functions yourself
[!warning] you always need to call release_regs_after_return
if you compile an instruction that terminates a basic block and does not branch to another block
bool compile_ret(IRInstRef inst) {
RetBuilder rb{this, *this->cur_cc_assigner()};
if () {
IRValueRef ret_val = ;
rb.add(ret_val);
}
rb.ret();
return true;
}
Branches
- branching mostly handled by the compiler
- use
generate_branch_to_block
from the architecture compiler to emit an (un)conditional branch, arguments are architecture-specific
- before calling it, need to call spill_before_branch so that values which live across blocks can be spilled and begin_branch_region which asserts that no value assignments are changed during branching
- after all branches are generated, call end_branch_region to end code generation for branches and release_spilled_regs so the register allocator can free values which can no longer be assumed to be in registers after the branch
[!warning] Never emit a branch to a block other than using generate_branch_to_block
Unconditional branch
- need to get
IRBlockRef
to jump to
- straightforward
bool compile_br(IRInstRef inst) {
IRBlockRef target = ;
const auto spilled = this->spill_before_branch();
this->begin_branch_region();
this->generate_branch_to_block(Jump::jmp, target, false, true);
this->end_branch_region();
this->release_spilled_regs(spilled);
return true;
}
- split only relevant to conditional branches
Conditional branch
- a bit more involved but can also be simple
- suppose jump to false block only when lower 32 bit 0, otherwise true block
- need to ask framework whether branch to block needs to be split
bool compile_condbr(IRInstRef inst) {
IRValueRef cond_val = ;
IRBlockRef true_target = ;
IRBlockRef false_target = ;
{
auto [cond_ref, cond_part] = this->val_ref_single(cond_val);
ASM(CMP32ri, cond_part.load_to_reg(), 0);
}
const bool true_needs_split = this->branch_needs_split(true_target);
const auto spilled = this->spill_before_branch();
this->begin_branch_region();
this->generate_branch_to_block(Jump::jne, true_target, true_needs_split, false);
this->generate_branch_to_block(Jump::jmp, false_target, false, true);
this->end_branch_region();
this->release_spilled_regs(spilled);
return true;
}
- want to optimize this, since we want to avoid branch to block that comes directly after
- a bit more involved but code can be easily reused
bool compile_condbr(IRInstRef inst) {
IRValueRef cond_val = ;
IRBlockRef true_target = ;
IRBlockRef false_target = ;
{
auto [cond_ref, cond_part] = this->val_ref_single(cond_val);
ASM(CMP32ri, cond_part.load_to_reg(), 0);
}
const bool true_needs_split = this->branch_needs_split(true_target);
const bool false_needs_split = this->branch_needs_split(false_target);
const IRBlockRef next_block = this->analyzer.block_ref(this->next_block());
const auto spilled = this->spill_before_branch();
this->begin_branch_region();
if (next_block == true_target || (next_block != false_target && true_needs_split)) {
this->generate_branch_to_block(this->invert_jump(Jump::jne), false_target, false_needs_split, false);
this->generate_branch_to_block(Jump::jmp, true_target, false, true);
} else {
this->generate_branch_to_block(Jump::jne, true_target, true_needs_split, false);
this->generate_branch_to_block(Jump::jmp, false_target, false, true);
}
this->end_branch_region();
this->release_spilled_regs(spilled);
return true;
}
Emitting function calls
- calls emitted using CallBuilder helper or
generate_call
helper function in architecture compiler
- two types of target:
- direct call to symbol
- indirect call to ptr in ValuePartRef
generate_call
- simpler API
- arguments given by their IRValueRef + flags for sign/zero-extension or passed as byval
- results returned in ValueRef
- supports calling vararg functions
- will always use the default C calling convention
bool compile_call_direct(IRInstRef inst) {
IRValueRef res_val = ;
u32 target_idx = ;
SymRef target_sym = this->func_syms[target_idx];
tpde::SmallVector<CallArg, 4> args{};
for (IRValueRef call_arg : ) {
args.push_back(CallArg{call_arg});
}
ValueRef res_ref = this->result_ref(res_val);
this->generate_call(target_sym, args, &res_ref, false);
return true;
}
- other ways to get symbols, you can create your own, e.g. LLVM back-end does this
CallBuilder
- more complex but more flexible API
- defined in architecture compiler
- takes a reference to a CCAssigner to allocate argument and result register
- vararg configured in CCAssigner
- if a custom CCAssigner is used (e.g.
tpde::x64::CCAssignerSysV
), storage for it needs to be allocated and held live as long as the CallBuilder is live
- for each argument call add_arg with a CallArg (IRValueRef + flags) or with a ValuePart and a CCAssignment which contains more fine-grained information
- then use call to generate the call
- use add_ret to collect the result values
bool compile_call_direct(IRInstRef inst) {
IRValueRef res_val = ;
u32 target_idx = ;
SymRef target_sym = this->func_syms[target_idx];
CallBuilder cb{*this, *this->cur_cc_assigner()};
for (IRValueRef call_arg : ) {
ValueRef vr = this->val_ref(call_arg);
CCAssignment cca;
cb.add_arg(vr.part(0), cca);
}
cb.call(target_sym);
ValueRef res_ref = this->result_ref(res_val);
CCAssignment cca;
cb.add_ret(res_ref.part(0));
return true;
}
Manual stack allocations
- manual allocation of stack slots sometimes needed, e.g. when an instruction needs too many temporaries or manual handling of static allocations
- allocated using allocate_stack_slot, free'd using free_stack_slot
[!note] only free when all blocks in which the stack slot could be live have been compiled
ScratchReg scratch{this};
AsmReg reg = scratch.alloc_gp();
i32 slot = this->allocate_stack_slot( 8);
this->spill_reg(reg, slot, 8);
this->load_from_stack(reg, slot, 8, false);
this->free_stack_slot(slot, 8);
Custom Variable Ref Handling
- ValueRefs can be marked as being "variable references"
- i.e. this means that these values are pointers that are easy to calculate on-demand and are not spilled when their registers are evicted
- by default, ValueRefs for static stack allocations are classified as variable references
- these also have the
stack_variable
bit set in the ValueAssignment
- compiler has option to allow user to add custom kinds of variable references, e.g. global value pointers
- set
DEFAULT_VAR_REF_HANDLING
to false
- then your compiler will need to implement
setup_var_ref_assignments
and load_address_of_var_reference
:
setup_var_ref_assignments
can initialize the assignments of values which need to be valid at the beginning of the function using init_variable_ref
load_address_of_var_reference
will need to calculate the address of the value the variable reference points to and load it into a register
- user can use
variable_ref_data
in the AssignmentPartRef of the first part to store custom information about the value, e.g. an index into a custom data structure
- you will need to handle access to a custom variable ref assignment in
val_ref_special
/val_part_ref_special
and initialize it using init_variable_ref if necessary
- this can then return a normal ValuePartRef referencing this assignment
- An implementation of this can be found in TPDE-LLVM
Architecture-specific
Calling Conventions
- each architecture defines a set of CCAssigners which handle register/stack allocation for argument and return registers for a calling convention
- your compiler may implement the
cur_cc_assigner
function that returns the assigner for the current function
generate_call
always uses the default C calling convention
- CallBuilders take a CCAssigner as a constructor argument to specify the calling convention to use
- currently, only SysV/AAPCS is supported
Assembler
- manages data structures for code and data sections, their relocations, symbols and Labels (i.e. function-local symbols)
- handles generating an object file and unwind information
- configured using the compiler config, instance stored in CompilerBase
- currently only x86-64 and AArch64 Linux ELF supported
- unwind info is synchroneous
- supports generating C++ exception information
Sections
- various sections for code or data created
- accessed using
SecRef
s
- current code section stored in Assembler as current_section
- other sections accessed using
get_{data,bss,tdata,tbss,structor,eh_frame}_section
Symbols
- SymRef used to reference symbols
- various functions to create and define symbols
SymBinding
- three levels to define linkage of symbols, reflecting ELF semantics:
sym_add_undef
SymRef sym_add_undef(std::string_view name, SymBinding binding) noexcept;
Adds an undefined symbol, possibly with a name. Can be defined later using sym_def.
sym_predef_*
SymRef sym_predef_func(std::string_view name, SymBinding binding) noexcept;
SymRef sym_predef_data(std::string_view name, SymBinding binding) noexcept;
SymRef sym_predef_tls(std::string_view name, SymBinding binding) noexcept;
Predefines a symbol for a function, data or TLS object respectively. Must be defined later using sym_def for function or TLS objects and using sym_def_predef_data for data objects.
sym_def_predef_data
void sym_def_predef_data(SecRef sec, SymRef sym, std::span<const u8> data, u32 align, u32 *off) noexcept;
Defines a data object refered to by sym
in the section sec
with the specified data and alignment. If off
is not a nullptr
, the offset of the data in the section is written to the u32
pointed to by off
.
sym_def_data
SymRef sym_def_data(SecRef sec, std::string_view name, std::span<const u8> data, u32 align, SymBinding binding, u32 *off = nullptr) noexcept;
Shortcut for sym_predef_data followed by sym_def_predef_data
Relocations
- assembler can generate relocations
- useful if constant data is needed or globals are used
- usually target-dependent, you need to know what to generate
- example: GOT address load
void load_address_of_global(SymRef global_sym, AsmReg dst) {
ASM(MOV64rm, dst, FE_MEM( FE_IP, 0, FE_NOREG, -1));
this->assembler.reloc_text(global_sym, R_X86_64_GOTPCREL, this->assembler.text_cur_off() - 4, -4);
}
reloc_sec
void reloc_sec(SecRef sec, SymRef sym, u32 type, u64 offset, i64 addend) noexcept;
Adds a relocation of type
(architecture-/platform-specific) in section sec
at offset
to the specified symbol sym
with an addend.
reloc_pc32
void reloc_pc32(SecRef sec, SymRef sym, u64 offset, i64 addend) noexcept;
Adds a 32-bit pc-relative relocation in section sec
at offset
to the specified symbol sym
with an addend in an architecture-independent matter.
reloc_abs
void reloc_abs(SecRef sec, SymRef sym, u64 offset, i64 addend) noexcept;
Adds a 64-bit absolute relocation, i.e. writes the address of sym
, in section sec
at offset
to the specified symbol sym
with an addend in an architecture-independent matter.
reloc_text
void reloc_text(SymRef sym, u32 type, u64 offset, i64 addend = 0) noexcept;
Adds a reloctation to sym
in the current text section at offset
of type
(architecture-/platform-specific) with an optional addend.
Labels
- markers for code in current function
- used for
generate_raw_jump
in architecture compilers to encode control flow when compiling a single instruction
- create with label_create
- place with label_place
- all control flow within an instruction must converge at the end of it, i.e. no branches to blocks inside control flow
void compile_highest_bit_set(IRInstRef inst) noexcept {
AsmReg in_reg = ;
AsmReg out_reg = ;
ScratchReg tmp{this};
AsmReg tmp_reg = tmp.alloc_gp();
Label fin = this->assembler.label_create();
Label cont = this->assembler.label_create();
ASM(XOR32rr, out_reg, out_reg);
ASM(DEC64r, out_reg);
ASM(MOV64rr, tmp_reg, in_reg);
this->label_place(cont);
ASM(CMP64ri, tmp_reg, 0);
this->generate_raw_jump(Jump::jz, fin);
ASM(INC64r, out_reg);
ASM(SHR64ri, tmp_reg, 1);
this->generate_raw_jump(Jump::jmp, cont);
this->label_place(fin);
}
[!warning] Don't emit code that might cause registers to be spilled when in conditional control-flow
Jump Tables
- for switch tables it is beneficial to generate jump tables which encode offsets to labels
[!note] you cannot directly jump to blocks using an offset since the register allocator might need to do PHI moves at the edge. You first have to jump to a label that then branches to the target block
- Assembler has helper function to manually add an unresolved offset to a label
void write_jump_table(Label jump_table, std::span<Label> cases) {
SecRef text_ref = this->text_writer.get_sec_ref();
this->text_writer.align(4);
this->text_writer.ensure_space(4 + 4 * labels.size());
this->label_place(jump_table);
u32 table_off = this->text_writer.offset();
for (Label case_label : cases) {
if (this->assembler.label_is_pending(case_label)) {
this->assembler.add_unresolved_entry(
case_label,
text_ref,
this->text_writer.offset(),
tpde::x64::AssemblerElfX64::UnresolvedEntryKind::JUMP_TABLE
);
this->text_writer.write<u32>(table_off);
} else {
u32 label_off = this->assembler.label_offset(case_label);
this->text_writer.write<i32>((i32)label_off - (i32)table_off);
}
}
}
Compiling and Generating Object File
- When compilation is finished (CompilerBase::compile returns), Assembler can be used to create object or write to memory
Generate object
- build_object_file writes out a finished ELF object file to a vector
void compile_elf() {
if (!compiler->compile()) {
return;
}
std::vector<u8> obj = compiler->assembler.build_object_file();
}
Mapping to memory
- For ELF Linux x64/AArch64 there is a helper class ElfMapper which maps the compiled sections directly into memory
- Need to provide a function to resolve undefined symbols
- Get symbol address using get_sym_addr
- On destruction, automatically free's the memory and deregisters unwind info
[!note] can be reused after reset
void compile_and_map() {
if (!compiler->compile()) {
return;
}
tpde::ElfMapper mapper{};
if (!mapper.map(compiler->assembler, [](std::string_view sym_name) -> void* { })) {
return;
}
void* fun_addr = mapper.get_sym_addr(compiler->func_syms[0]);
static_cast<void(*)()>(fun_addr)();
mapper.reset();
}