Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compiler architecture and internals

revive relies on solc, the Ethereum Solidity compiler, as the Solidity frontend to process smart contracts written in Solidity. LLVM, a popular and powerful compiler framework, is used as the compiler backend and does the heavy lifting in terms of optimizitations and RISC-V code generation.

revive mainly takes care of lowering the Yul intermediate representation (IR) produced by solc to LLVM IR. This approach provides a good balance between maintaining a high level of Ethereum compatibility, good contract performance and feasible engineering efforts.

resolc

resolc is the overarching compiler driver library and binary.

When compiling a Solidity source file with resolc, the following steps happen under the hood:

  1. solc is used to lower the Solidity source code into YUL intermediate representation.
  2. revive lowers the YUL IR into LLVM IR.
  3. LLVM optimizes the code and emits a RISC-V ELF shared object (through LLD).
  4. The PolkaVM linker finally links the ELF shared object into a PolkaVM blob.

This compilation process can be visualized as follows:

Architecture Overview

Reproducible contract builds

Because on-chain contract code is identified via its code blob hash, it is crucial to maintain reproducible contract builds. A given compiler version must reproduce the contract build exactly on every target platform resolc supports via the official binary releases.

To ensure this, we employ the following measures:

  • The code generation must be fully deterministic. For example iterating over standard HashMap invalidates this due to its internal state, making it an invalid operation in revive. To circumvent that, a BTreeMap can be used instead.
  • We release fully statically linked resolc binaries. This prevents dynamic linking of potentially differentiating libraries.
  • The only non-bundled dependency is the solc compiler. This is considered fine because the same properties apply to solc.

The revive compiler libraries

The main compiler logic is implemented in the revive-yul and revive-llvm-context crates.

The Yul library implements a lexer and parser and lowers the resulting tree into LLVM IR. It does so by emitting LL using the LLVM builder and our own revive-llvm-context compiler context crate. The revive LLVM context crate encapsulates code generation logic (decoupled from the parser).

The Yul library also implements a simple visitor interface (see visitor.rs). If you want to work with the AST, it is strongly recommended to implement visitors. The LLVM code generation is implemented using a dedicated trait for historical reasons only.

EVM heap memory

PVM doesn’t offer a similar API. Hence the emitted contract code emulates the linear EVM heap memory using a static byte buffer. Data inside this byte buffer is kept big endian for EVM compatibility reasons (unaligned access is allowed and makes optimizing this non-trivial).

Unlike with the EVM, where heap memory usage is gas metered, our heap size is static (the size is user controllable via a setting flag). The compiler emits bound checks to prevent overflows.

The LLVM dependency

LLVM is a special non Rust dependency. We interface its builder interface via the inkwell wrapper crate.

We use upstream LLVM, but release and use our custom builds. We require the compiler builtins specifically built for the PVM rv64emacb target and always leave assertions on. Furthermore, we need cross builds because resolc itself targets emscripten and musl. The revive-llvm-builer functions as a cross-platform build script and is used to build and release the LLVM dependency.

We also maintain the lld-sys crate for interfacing with LLD. The LLVM linker is used during the compilation process, but we don’t want to distribute another binary.

Custom optimizations

At the moment, no significant custom optimizations are implemented. Thus, we are missing some optimization opportunities that neither solc nor LLVM can realize (due to their lack of domain specific knowledge about the semantics of our target environment). Furthermore, solc optimizes for EVM gas and a target machine orthogonal to our target (BE 256-bit stack machine EVM vs. 64-bit LE RISC architecture PVM). We have started working on an additional IR layer between Yul and LLVM to capture missed optimization opportunities, though.