TPDE
|
TPDE-LLVM is a TPDE-based LLVM back-end focusing on fast compilation targeting x86-64 and AArch64. Typically, compile times are 10–20x faster than the LLVM -O0
back-end with similar execution time, code size is ~10-30% larger for -O0
IR and similar for -O1
IR. The focus is on supporting a commonly used subset of LLVM-IR and target platforms efficiently, therefore many IR features are not supported – in such cases, the intention is to fall back to the full LLVM back-end. Code generated by Clang (-O0
/-O1
) will typically compile; -O2
and higher will typically fail due to unsupported vector operations.
Standalone usage is possible through the tools tpde-llc
(similar to llc
), which compile LLVM IR or bitcode to an ELF object file.
Library usage is possible through tpde_llvm::LLVMCompiler, which supports compiling a module to an object file or mapping it into the memory of the current process for JIT execution. The JIT mapper only supports very typical ELF constructs (e.g., no TLS), if this is not sufficient, the object file can also be mapped through LLVM's ORC JIT (see tpde-llvm/tools/tpde-lli.cpp for an example).
Note that compilation is likely to modify the module. All constant expressions inside functions are replaced with instruction sequences and all accesses to thread-local variables are rewritten to use llvm.threadlocal.address
.
We provide a patch to integrate TPDE-LLVM into Clang/Flang. Apply the patch from the root directory of the repository and add this repository under clang/lib/CodeGen/tpde2
(e.g., via a symlink). This adds two options to the clang
and flang
drivers:
-ftpde
: Use TPDE instead of the regular LLVM back-end. Inputs that TPDE can't handle will cause a fall back to LLVM and emit a warning.-ftpde-abort
: Abort when the input is not supported, don't fallback to LLVM.Note: most LLVM-specific code-gen options will be ignored.
Unsupported features currently include:
i64
except i128
(i128
is supported), pointers with non-zero address space, half
, bfloat
, ppc_fp128
, x86_fp80
, x86_amx
. Code with x86-64 long double
needs to be compiled with -mlong-double-64
.select
aggregate type other than {i64, i64}
.bitcast
larger than 64 bit.seqcst
for atomicrmw
).fp128
: fneg
, fcmp one/ueq
, many intrinsics.goto
(blockaddress
, indirectbr
).landingpad
with non-empty filter
clause.llvm.cttz
only for 8/16/32/64-bit).Vector support has some substantial limitations. The main focus is to support constructs that typically occur in unoptimized code that uses intrinsics. Generating high-quality vectorized code would require substantial effort and is therefore a non-goal. Scalar code can perform better than vectorized code that would otherwise be an improvement. For example, shufflevector
is always a series of scalar extracts/inserts and icmp
in many cases expensively packs the result into a bit vector.
The only supported element types are i1
/i8
/i16
/i32
/i64
/ptr
/float
/double
. i1
vectors with more than 64 elements are unsupported. Due to the limit of aggregate types to max. 65535 elements, vectors with more elements are not always supported (see below).
Only certain types (layout-compatible types) are lowered to a layout guaranteed to be compatible with LLVM, which are typically the types defined by the ABI (16-byte non-i1
vectors on x86-64, 8/16-byte non-i1
vectors on AArch64). For other types, the in-register layout is often incompatible with LLVM. Such types therefore cannot cross function boundaries as argument/return value, even for purely internal functions. TPDE's current lowering rules for non-layout-compatible types are:
i1
vectors are represented as compact integer in a single general-purpose register. (LLVM typically promotes these to a larger vector type, e.g. <16 x i1>
to <16 x i8>
.)<64 x i8>
behaves like [4 x <16 x i8>]
. (LLVM typically widens to the next power of two first.)<63 x i8>
behaves like [63 x i8]
. (LLVM has very complex and type/target-dependent rules for promoting/widening integers.)Note that the in-memory representation of vectors always matches the representation of LLVM.
Support for arithmetic operations, comparisons, casts, and conversions is generally limited. Some operations are only implemented for layout-compatible types. Some operations are unconditionally scalarized, even if a native vector instruction exists. Additionally, the following operations are currently unsupported:
<N x i1>
insertelement
, extractelement
, and shufflevector
.getelementptr
.select
.extractelement
with a constant source vector.