TPDE
Loading...
Searching...
No Matches
TPDE-LLVM

TPDE-LLVM is a TPDE-based LLVM back-end focusing on fast compilation targeting x86-64 and AArch64. Typically, compile times are 10–20x faster than the LLVM -O0 back-end with similar execution time, code size is ~10-30% larger for -O0 IR and similar for -O1 IR. The focus is on supporting a commonly used subset of LLVM-IR and target platforms efficiently, therefore many IR features are not supported – in such cases, the intention is to fall back to the full LLVM back-end. Code generated by Clang (-O0/-O1) will typically compile; -O2 and higher will typically fail due to unsupported vector operations.

Usage

Standalone usage is possible through the tools tpde-llc (similar to llc), which compile LLVM IR or bitcode to an ELF object file.

Library usage is possible through tpde_llvm::LLVMCompiler, which supports compiling a module to an object file or mapping it into the memory of the current process for JIT execution. The JIT mapper only supports very typical ELF constructs (e.g., no TLS), if this is not sufficient, the object file can also be mapped through LLVM's ORC JIT (see tpde-llvm/tools/tpde-lli.cpp for an example).

llvm::Triple triple(mod->getTargetTriple());
std::unique_ptr<tpde_llvm::LLVMCompiler> compiler = tpde_llvm::LLVMCompiler::create(triple);
std::vector<uint8_t> buf;
if (compiler && compiler->compile_to_elf(*mod, buf)) {
// Compilation successful, buf contains object file
} else {
// Triple unsupported or compilation failed
}
static std::unique_ptr< LLVMCompiler > create(const llvm::Triple &triple) noexcept

Note that compilation is likely to modify the module. All constant expressions inside functions are replaced with instruction sequences and all accesses to thread-local variables are rewritten to use llvm.threadlocal.address.

Integration Into Clang/Flang

We provide a patch to integrate TPDE-LLVM into Clang/Flang. Apply the patch from the root directory of the repository and add this repository under clang/lib/CodeGen/tpde2 (e.g., via a symlink). This adds two options to the clang and flang drivers:

  • -ftpde: Use TPDE instead of the regular LLVM back-end. Inputs that TPDE can't handle will cause a fall back to LLVM and emit a warning.
  • -ftpde-abort: Abort when the input is not supported, don't fallback to LLVM.

Note: most LLVM-specific code-gen options will be ignored.

Unsupported Features

Unsupported features currently include:

  • Targets other than x86-64-v1/AArch64 (ARMv8.1) (Linux) ELF.
  • Code models other than Small-PIC.
  • Scalar types: integer types larger than i64 except i128 (i128 is supported), pointers with non-zero address space, half, bfloat, ppc_fp128, x86_fp80, x86_amx. Code with x86-64 long double needs to be compiled with -mlong-double-64.
  • Aggregate types with in total more than 65535 elements.
  • select aggregate type other than {i64, i64}.
  • bitcast larger than 64 bit.
  • Atomic operations might use a stronger consistency than required (e.g., always seqcst for atomicrmw).
  • Calling conventions other than the C calling convention (SysV on x86-64, AAPCS on AArch64).
  • fp128: fneg, fcmp one/ueq, many intrinsics.
  • Computed goto (blockaddress, indirectbr).
  • landingpad with non-empty filter clause.
  • Many intrinsics, and some intrinsics are only implemented for commonly used types (e.g., llvm.cttz only for 8/16/32/64-bit).
  • IFuncs.
  • Various forms of constant expressions in global initializers.
  • Non-empty inline assembly.
  • Full asynchronous unwind info (frame info only correct in prologue and at call sites).
  • Several corner cases that we didn't encounter so far.

Vector Support

Vector support has some substantial limitations. The main focus is to support constructs that typically occur in unoptimized code that uses intrinsics. Generating high-quality vectorized code would require substantial effort and is therefore a non-goal. Scalar code can perform better than vectorized code that would otherwise be an improvement. For example, shufflevector is always a series of scalar extracts/inserts and icmp in many cases expensively packs the result into a bit vector.

Supported Types

The only supported element types are i1/i8/i16/i32/i64/ptr/float/double. i1 vectors with more than 64 elements are unsupported. Due to the limit of aggregate types to max. 65535 elements, vectors with more elements are not always supported (see below).

Type Compatibility

Only certain types (layout-compatible types) are lowered to a layout guaranteed to be compatible with LLVM, which are typically the types defined by the ABI (16-byte non-i1 vectors on x86-64, 8/16-byte non-i1 vectors on AArch64). For other types, the in-register layout is often incompatible with LLVM. Such types therefore cannot cross function boundaries as argument/return value, even for purely internal functions. TPDE's current lowering rules for non-layout-compatible types are:

  • i1 vectors are represented as compact integer in a single general-purpose register. (LLVM typically promotes these to a larger vector type, e.g. <16 x i1> to <16 x i8>.)
  • Vectors where the element type is a multiple of a directly supported vector type are lowered as multiple parts of the same type, e.g. <64 x i8> behaves like [4 x <16 x i8>]. (LLVM typically widens to the next power of two first.)
  • Other vector types are scalarized, e.g. <63 x i8> behaves like [63 x i8]. (LLVM has very complex and type/target-dependent rules for promoting/widening integers.)

Note that the in-memory representation of vectors always matches the representation of LLVM.

Operations

Support for arithmetic operations, comparisons, casts, and conversions is generally limited. Some operations are only implemented for layout-compatible types. Some operations are unconditionally scalarized, even if a native vector instruction exists. Additionally, the following operations are currently unsupported:

  • <N x i1> insertelement, extractelement, and shufflevector.
  • Vector getelementptr.
  • Vector-predicated select.
  • extractelement with a constant source vector.
  • Most vector intrinsics.