Compiler architecture and internals
revive relies on solc, the Ethereum Solidity compiler, as the Solidity frontend to process smart contracts written in Solidity. LLVM, a popular and powerful compiler framework, is used as the compiler backend and does the heavy lifting in terms of optimizitations and RISC-V code generation.
revive mainly takes care of lowering the Yul intermediate representation (IR) produced by solc to LLVM IR. This approach provides a good balance between maintaining a high level of Ethereum compatibility, good contract performance and feasible engineering efforts.
resolc
resolc is the overarching compiler driver library and binary.
When compiling a Solidity source file with resolc, the following steps happen under the hood:
solcis used to lower the Solidity source code into YUL intermediate representation.revivelowers the YUL IR into LLVM IR.- LLVM optimizes the code and emits a RISC-V ELF shared object (through LLD).
- The PolkaVM linker finally links the ELF shared object into a PolkaVM blob.
This compilation process can be visualized as follows:
Reproducible contract builds
Because on-chain contract code is identified via its code blob hash, it is crucial to maintain reproducible contract builds. A given compiler version must reproduce the contract build exactly on every target platform resolc supports via the official binary releases.
To ensure this, we employ the following measures:
- The code generation must be fully deterministic. For example iterating over standard
HashMapinvalidates this due to its internal state, making it an invalid operation inrevive. To circumvent that, aBTreeMapcan be used instead. - We release fully statically linked
resolcbinaries. This prevents dynamic linking of potentially differentiating libraries. - The only non-bundled dependency is the
solccompiler. This is considered fine because the same properties apply tosolc.
The revive compiler libraries
The main compiler logic is implemented in the revive-yul and revive-llvm-context crates.
The Yul library implements a lexer and parser and lowers the resulting tree into LLVM IR. It does so by emitting LL using the LLVM builder and our own revive-llvm-context compiler context crate. The revive LLVM context crate encapsulates code generation logic (decoupled from the parser).
The Yul library also implements a simple visitor interface (see visitor.rs). If you want to work with the AST, it is strongly recommended to implement visitors. The LLVM code generation is implemented using a dedicated trait for historical reasons only.
EVM heap memory
PVM doesn’t offer a similar API. Hence the emitted contract code emulates the linear EVM heap memory using a static byte buffer. Data inside this byte buffer is kept big endian for EVM compatibility reasons (unaligned access is allowed and makes optimizing this non-trivial).
Unlike with the EVM, where heap memory usage is gas metered, our heap size is static (the size is user controllable via a setting flag). The compiler emits bound checks to prevent overflows.
The LLVM dependency
LLVM is a special non Rust dependency. We interface its builder interface via the inkwell wrapper crate.
We use upstream LLVM, but release and use our custom builds. We require the compiler builtins specifically built for the PVM rv64emacb target and always leave assertions on. Furthermore, we need cross builds because resolc itself targets emscripten and musl. The revive-llvm-builer functions as a cross-platform build script and is used to build and release the LLVM dependency.
We also maintain the lld-sys crate for interfacing with LLD. The LLVM linker is used during the compilation process, but we don’t want to distribute another binary.
Custom optimizations
At the moment, no significant custom optimizations are implemented. Thus, we are missing some optimization opportunities that neither solc nor LLVM can realize (due to their lack of domain specific knowledge about the semantics of our target environment). Furthermore, solc optimizes for EVM gas and a target machine orthogonal to our target (BE 256-bit stack machine EVM vs. 64-bit LE RISC architecture PVM). We have started working on an additional IR layer between Yul and LLVM to capture missed optimization opportunities, though.