Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Welcome

Hello and a warm welcome to the revive Solidity compiler book!

Warning

Solidity on PVM is running on the pallet-revive runtime. This introduces observable semantic differences in comparison with the EVM.

Study the differences section carefully. Ignoring these differences may lead to defunct contracts.

Notable examples:

  • The 63/64 gas rule isn’t implemented in the pallet (introduces potential DoS vector when calling other contracts)
  • Contract instantiation works differently (by hash instead of by code)
  • The gas model implemented by pallet-revive differs from Ethereum
  • The heap size is fixed instead of gas-metered and there’s a fixed amount of stack size (contracts working fine on EVM may trap on PVM)

Target audience

  • Solidity dApp developers should read the user guide. Solidity on PolkaVM introduces important differences to EVM which should be well understood.
  • Contributors will find the developer guide helpful for getting up to speed.

Other Polkadot contracts resources

Head to contracts.polkadot.io for more general information about contracts on Polkadot.

About

This mdBook documents the revive Solidity compiler project. The content is found under book/. Run make book to observe changes.

resolc user guide

resolc is a Solidity v0.8 compiler for Polkadot native smart contracts. Solidity compiled with resolc targets PolaVM (PVM). Thanks to additional compiler optimizations and the PVM JIT, contract code can execute much faster than the EVM equivalent. resolc supports almost all Solidity v0.8 features including inline assembly, offering a high level of comptability with the Ethereum Solidity reference implementation.

revive vs. resolc nomenclature

revive is the name of the overarching “Solidity to PolkaVM” compiler project, which contains multiple components (for example the Yul parser, the code generation library, the resolc executable itself, and many more things).

resolc is the name of the compiler driver executable, combining many revive components in a single and easy to use binary application.

In other words, revive is the whole compiler infrastructure (more like LLVM) and resolc is a user-facing single-entrypoint compiler frontend (more like clang).

Installation

Building Solidity contracts for PolkaVM requires installing the following two compilers:

resolc binary releases

resolc is supported an all major operating systems and installation is straightforward. Please find our binary releases for the following platforms:

  • Linux (MUSL)
  • MacOS (universal)
  • Windows
  • Wasm via emscripten

Installing the solc dependency

resolc uses solc during the compilation process, please refer to the Ethereum Solidity documentation for installation instructions.

revive NPM package

We distribute the revive compiler as node.js module.

Buidling resolc from source

Please follow the build instructions in the revive README.md.

CLI usage

We aim to keep the resolc CLI usage close to solc. There are a few things and options worthwhile to know about in resolc which do not exist in the Ethereum world. This chapter explains those in more detail than the CLI help message.

Tip

For the complete help about CLI options, please see resolc --help.

LLVM optimization levels

-O, --optimization <OPTIMIZATION>

resolc exposes the optimization level setting for the LLVM backend. The performance and size of compiled contracts varies widely between different optimization levels. (This is independent of --newyork which selects the IR lowering pipeline.)

Valid levels are the following:

  • 0: No optimizations are applied.
  • 1: Basic optimizations for execution time.
  • 2: Advanced optimizations for execution time.
  • 3: Aggressive optimizations for execution time.
  • s: Optimize for code size.
  • z: Aggressively optimize for code size.

By default, -Oz is applied.

newyork IR pipeline

--newyork

Enables the newyork optimizer to reduced compiled contract code size, by routing Yul lowering through the experimental newyork IR pipeline instead of the standard Yul-to-LLVM path. Composes with --yul, --combined-json, and the default Solidity mode. In standard JSON mode this flag is rejected; enable the pipeline via the settings.polkavm.newyork input field instead. Off by default.

Stack size

--stack-size <STACK_SIZE>

PVM is a register machine with a traditional stack memory space for local variables. This controls the total amount of stack space the contract can use.

You are incentivized to keep this value as small as possible:

  1. Increasing the stack size will increase gas costs due to increased startup costs.
  2. The stack size contributes to the total memory size a contract can use, which includes the contract’s code size.

Default value: 131072

Warning

If the contract uses more stack memory than configured, it will compile fine but eventually revert execution at runtime!

Heap size

--heap-size <HEAP_SIZE>

Unlike the EVM, due to the lack of dynamic memory metering, PVM contracts emulate the EVM heap memory with a static buffer. Consequentially, instead of infinite memory with exponentially growing gas costs, PVM contracts have a finite amount of memory with constant gas costs available.

You are incentivized to keep this value as small as possible: 1.Increasing the heap size will increase startup costs. 2.The heap size contributes to the total memory size a contract can use, which includes the contract’s code size

Default value: 131072

Warning

If the contract uses more heap memory than configured, it will compile fine but eventually revert execution at runtime!

solc

--solc <SOLC>

Specify the path to the solc executable. By default, the one in ${PATH} is used.

Debug artifacts

--debug-output-dir <DEBUG_OUTPUT_DIRECTORY>

Dump all intermediary compiler artifacts to files in the specified directory. This includes the Yul IR, optimized and unoptimized LLVM IR, the ELF object and the PVM assembly. When the newyork pipeline is active, the newyork IR is additionally dumped (the final IR, a pre-late-pass snapshot, and heap and memory optimization data). Useful for debugging and development purposes.

Debug info

-g

Generate source based debug information in the output code file. Useful for debugging and development purposes and disabled by default.

Deploy time linking

--link [--libraries <LIBRARIES>] <INPUT_FILES>

In Solidity, 3 things can happen with libraries:

  1. They are not externally callable and thus can be inlined.
    1. The solc Solidity optimizer inlines those (usually the case). Note: resolc always activates the solc Solidity optimizer.
    2. If the solc Solidity optimizer is disabled or for some reason fails to inline them (both rare), they are not inlined and require linking.
  2. They are externally callable but still linked at compile time. This is the case if at compile time the library address is known (i.e. --libraries supplied in CLI or the corresponding setting in STD JSON input).
  3. They are linked at deploy time. This happens when the compiler does not know the library address (i.e. --libraries flag is missing or the provided libraries are incomplete, same for STD JSON input). This case is rare because it’s discourage and should never be used by production dApps.

In cases 1.2 and 3:

  • Some of the produced code blobs will be in the “unlinked” raw ELF object format and not yet deployable.
  • To make them deployable, they need to be “linked” (done using the resolc --link linker mode explained below).
  • The compiler emitted DELEGATECALL instructions to call non-inlined (unlinked) libraries. The contract deployer must make sure to deploy any libraries prior to contract deployment.

Warning

Using deploy time linking is officially discouraged. Mainly due to bytecode hashes changing after the fact. We decided to support it in resolc regardless, due to popular request.

Similar to how it works in solc, --libraries may be used to provide libraries during linking mode.

Unlike with solc, where linking implies a simple string substitution mechanism, resolc needs to resolve actual missing ELF symbols. This is due to how factory dependencies work in PVM. As a consequence, it isn’t sufficient to just provide the unlinked blobs to the linker. Instead, they must be provided in the exact same directory structure the Solidity source code was found during compile time.

Example:

  • The contract src/foo/bar.sol:Bar is involved in deploy time linking. It may be a factory dependency.
  • The contract blob needs to be provided inside a relative src/foo/ directory to --link. Otherwise symbol resolution may fail.

Note

Tooling is supposed to take care of this. In the future, we may append explicit linkage data to simplify the deploy time linking feature.

JS NPM package

The resolc compiler driver is published as an NPM package under @parity/resolc.

It’s usable from Node.js code or directly from the command line:

npx @parity/resolc@latest --bin crates/integration/contracts/flipper.sol -o /tmp/out

Note

While the npm package makes a nice portable option, it doesn’t expose all options.

Tooling integration

resolc achieved successful integration with a variety of third party developer tools.

Solidity toolkits

Support for resolc is available in forks of the hardhat and foundry Solidity toolkits:

Compiler explorer

resolc is available on godbolt.org for the Solidity and Yul input languages. See also the announcement post on the forum.

Remix IDE

There is remix IDE fork with resolc support at remix.polkadot.io. Unfortunately this is no longer actively maintained (there might be bugs and outdated resolc versions).

Standard JSON interface

The revive compiler is mostly compatible with the solc standard JSON interface. There are a few differences and additional (PVM related) input configurations:

The settings.polkavm object

Used to configure PVM specific compiler settings.

settings.polkavm.debugInformation

A boolean value allowing to enable debug information. Corresponds to resolc -g.

settings.polkavm.newyork

A boolean value allowing to enable the experimental newyork IR pipeline for Yul lowering. Corresponds to resolc --newyork. Off by default.

The output JSON includes which pipeline actually ran, via the top-level resolc_pipeline field ("newyork" or "yul" for the standard Yul-to-LLVM pipeline).

The settings.polkavm.memoryConfig object

Used to apply PVM specific memory configuration settings.

settings.polkavm.memoryConfig.heapSize

A numerical value allowing to configure the contract heap size. Corresponds to resolc --heap-size.

settings.polkavm.memoryConfig.stackSize

A numerical value allowing to configure the contract stack size. Corresponds to resolc --stack-size.

The settings.optimizer object

The settings.optimizer object is augmented with support for PVM specific optimization settings.

settings.optimizer.mode

A single char value to configure the LLVM optimizer settings. Corresponds to resolc -O.

settings.llvmArguments

Allows to specify arbitrary command line arguments to LLVM initialization. Used mainly for development and debugging purposes.

The settings.outputSelection object

Used to select desired outputs.

The “all” (*) wildcard

Resolc supports the “all” (*) wildcard for the file-level (first-level) and contract-level (second-level) keys. A file-level key can be either the wildcard or a specific file name, whereas the contract-level key can only be the wildcard for robustness reasons.

Thus, output can be requested in 2 ways:

// All files and all contracts:
{
  "settings": {
    "outputSelection": {
      "*": {
        "*": [/* specific contract-level output fields */],
        "": [/* specific file-level output fields */]
      }
    }
  }
}

// Specific files and all their contracts:
{
  "settings": {
    "outputSelection": {
      "path/to/my/file.sol": {
        "*": [/* specific contract-level output fields */],
        "": [/* specific file-level output fields */]
      },
      // Rest of files...
    }
  }
}

The contract-level evm output selection

Note

Currently, resolc supports requesting either the full evm output, or one more level of specificity, such as evm.bytecode.

When requesting code generation, such as evm.bytecode or evm.assembly, the resolc compilation process additionally needs ast, metadata, irOptimized, and evm.methodIdentifiers selectors. These selectors will be automatically added if code generation is needed, but will only be included in the output if explicitly requested.

{
  "settings": {
    "outputSelection": {
      "path/to/my/file1.sol": {
        // Contracts in this file will generate bytecode.
        // Only these fields of the JSON output selection will be in the `contracts` output.
        "*": ["abi", "evm.methodIdentifiers", "metadata", "evm.bytecode"],
        // Only this field of the JSON output selection will be in the `sources` output.
        "": ["ast"]
      },
      "path/to/my/file2.sol": {
        // No contracts in this file will generate bytecode.
        "*": ["abi", "evm.methodIdentifiers", "metadata"],
        // No `ast` will be in the `sources` output (only the automatically added `id`,
        // similar to solc as this is not a configurable output selection).
        "": []
      },
    }
  }
}

Differences to EVM

This section highlights some potentially observable differences in the YUL EVM dialect translation compared to Ethereum Solidity.

Solidity developers deploying dApps to pallet-revive ought to read and understand this section well.

Deploy code vs. runtime code

Our contract runtime does not differentiate between runtime code and deploy (constructor) code. Instead, both are emitted into a single PVM contract code blob and live on-chain. Therefore, in EVM terminology, the deploy code equals the runtime code.

Tip

In constructor code, the codesize instruction will return the call data size instead of the actual code blob size.

Solidity

We are aware of the following differences in the translation of Solidity code.

The 63/64 gas rule

pallet-revive doesn’t apply the 63/64 gas rule. We strongly advice to change any code calling untrusted contracts to supply a limited amount of gas only!

address.creationCode

This returns the bytecode keccak256 hash instead.

YUL functions

The below list contains noteworthy differences in the translation of YUL functions.

Note

Many functions receive memory buffer offset pointer or size arguments. The PVM pointer size is 32 bit, supplying memory offset or buffer size values above 2^32-1 may lead to OutOfGas errors trap contract execution.

The solc compiler ought to always emit valid memory references, so Solidity dApp authors don’t need to worry about this unless they deal with low level assembly code.

In general, revive preserves the memory layout, meaning low level memory operations are supported. However, a few caveats apply:

  • The EVM linear heap memory is emulated using a fixed byte buffer of 128kb. This implies that the maximum memory a contract can use is limited to 128kb (on Ethereum, contract memory is capped by gas and therefore varies).
  • Thus, accessing memory offsets larger than the fixed buffer size will trap the contract at runtime with an OutOfBound error.
  • The compiler might detect and optimize unused memory reads and writes, leading to a different msize compared to what the EVM would see.

calldataload, calldatacopy

In the constructor code, the offset is ignored and this always returns 0.

Warning

pallet-revive restricts the calldata size (to 128kb at the time of writing).

codecopy

Only supported in constructor code.

create, create2

Deployments on revive work different than on EVM. In a nutshell: Instead of supplying the deploy code concatenated with the constructor arguments (the EVM deploy model), the revive runtime expects two pointers:

  1. A buffer containing the code hash to deploy.
  2. The constructor arguments buffer.

To make contract instantiation using the new keyword in Solidity work seamlessly, revive translates the dataoffset and datasize instructions so that they assume the contract hash instead of the contract code. The hash is always of constant size. Thus, revive is able to supply the expected code hash and constructor arguments pointer to the runtime.

Warning

This might fall apart in code creating contracts inside assembly blocks. We strongly discourage using the create family opcodes to manually craft deployments in assembly blocks! Usually, the reason for using assembly blocks is to save gas, which is likely futile on revive due to the underlying differences in the VM architectures, gas models and transaction costs.

dataoffset

Returns the contract hash.

datasize

Returns the contract hash size (constant value of 32).

revert, return

pallet-revive restricts the returndata size (to 128kb at the time of writing).

prevrandao, difficulty

Translates to a constant value of 2500000000000000.

pc, extcodecopy

Only valid to use in EVM (they also have no use case in PVM) and produce a compile time error.

blobhash, blobbasefee

Related to the Ethereum rollup model and produce a compile time error. Polkadot offers a superior rollup model, removing the use case for blob data related opcodes.

Difference regarding the solc via-ir mode

There are two different compilation pipelines available in solc and there are small differences between them.

Since resolc processes the YUL IR, always assume the solc IR based codegen behavior for contracts compiled with the revive compiler.

Example: State variable initialization order in inheritance

With via-ir, base constructors run before derived state variables are initialized:

contract InnerContract {
    uint public innerConstructedStartTokenId;

    constructor() {
        innerConstructedStartTokenId = _startTokenId();
    }

    function _startTokenId() internal view virtual returns (uint) {
        return 0;
    }
}

contract Test is InnerContract {
    uint public START_TOKEN_ID = 1;

    constructor() InnerContract() {
    }

    function _startTokenId() internal view virtual override returns (uint) {
        return START_TOKEN_ID;
    }
}

Here, innerConstructedStartTokenId in Test returns 0 (with legacy EVM codegen it’d return 1).

Rust contract libraries

Note

This is not yet implemented but something for consideration on the roadmap.

Solidity - tightly coupled to the EVM - introduces some inherent inefficiencies that are by design and either needs to be followed or can’t be easily worked around, even with efforts like better optimized compiler and VM implementations. This represents a technical dead end. So far the EVM sees no adoption beyond the blockchain industry. Chances are that the EVM end up deprecated for technical reasons (or maybe not and the RISC-V idea gets abandoned, who knows).

PVM, however, is a general purpose VM. It supports LLVM based mainstream programming languages like Rust. It’s a common software engineering practice to compose applications from pieces written in multiple languages, using each to their own strength. For example, AI solutions traditionally use the python scripting language for convenient developer experience, while the underlying AI models get implemented in a lower level language such as C++.

The same pattern can of course be applied to dApps, where we’d expect application specific languages like Solidity mixed with libraries implementing computationally complex algorithms in a lower level language. Business logic and user interfaces are naturally implemented as regular Solidity dApps which can include (link against) Rust libraries. Rust is a fast, safe low level language and the Polkadot SDK is written in Rust itself, making it an excellent choice.

For example, ZK proof verifiers or expensive DeFi primitives would benefit greatly from Rust implementations.

revive provides tooling support and a small Rust contracts SDK for seamless integration with Rust libraries.

revive-runner sandbox

Running contract code usually requires a blockchain node. While local dev nodes can be used, sometimes it’s just not desirable to do so. Instead, it can be much more convenient to run and debug contract code with a stripped down environment.

This is where the revive-runner comes in handy. In a nutshell, it is a single-binary no-blockchain pallet-revive runtime.

Installation and usage

Inside the root revive repository directory, install it from source (requires Rust installed):

make install-revive-runner

After installing, see revive-runner --help for usage help.

Trace logs

The standard RUST_LOG environment variable controls the log output from the contract execution. This includes revive runtime logs and PVM execution trace logs. Sometimes it’s convenient to have more fine granular insight. Some useful filters:

  • RUST_LOG=runtime=trace: The pallet-revive runtime trace logs.
  • RUST_LOG=polkavm=trace: Low level PolkaVM instruction tracing.

Automatic contract instantiation

To avoid running the constract in an unitialized state, revive-runner automatically instantiates the contract before calling it (constructor arguments can be provided).

Example

Suppose we want to trace the syscalls of the execution of a compiled contract file Flipper.pvm:

RUST_LOG=runtime=trace revive-runner -f Flipper.pvm
[DEBUG runtime::revive] Contract memory usage: purgable=6144/3145728 KB baseline=103063/1572864
[TRACE runtime::revive::strace] call_data_size() = Ok(0) gas_consumed: Weight { ref_time: 985209, proof_size: 0 }
[TRACE runtime::revive::strace] value_transferred(out_ptr: 4294836096) = Ok(()) gas_consumed: Weight { ref_time: 2937634, proof_size: 0 }
[TRACE runtime::revive::strace] call_data_copy(out_ptr: 131216, out_len: 0, offset: 0) = Ok(()) gas_consumed: Weight { ref_time: 4084483, proof_size: 0 }
[TRACE runtime::revive::strace] seal_return(flags: 0, data_ptr: 131216, data_len: 0) = Err(TrapReason::Return(ReturnData { flags: 0, data: [] })) gas_consumed: Weight { ref_time: 5510615, proof_size: 0 }
[TRACE runtime::revive] frame finished with: Ok(ExecReturnValue { flags: (empty), data: [] })
[TRACE runtime::revive::strace] call_data_size() = Ok(0) gas_consumed: Weight { ref_time: 985209, proof_size: 0 }
[TRACE runtime::revive::strace] seal_return(flags: 1, data_ptr: 131088, data_len: 0) = Err(TrapReason::Return(ReturnData { flags: 1, data: [] })) gas_consumed: Weight { ref_time: 2456669, proof_size: 0 }
[TRACE runtime::revive] frame finished with: Ok(ExecReturnValue { flags: REVERT, data: [] })

Developer guide

This chapter covers internal aspects of the compiler and helps contributors getting started with the revive codebase.

Contributor guide

The revive compiler is an open source software project and we gladly accept quality contributions from anyone!

Getting started

A quick reference on how to build the Solidity compiler is maintained in the project’s README.md.

Using the Makefile

The Makefile comprehensively encapsulates all development aspects of this codebase. It is kept concise and readable. Please read and use it! You’ll learn for example:

  • How to build and install a resolc development version.
  • How to run tests and benchmarks.
  • How to cross-compile resolc.

As a general rule-of-thumb: If make test runs fine locally, chances for green CI pipelines are good.

Codebase organization

For the most parts, revive is a rather standard Rust workspace codebase. There are some non-Rust dependencies, which sometimes complicates things a little bit.

The crates/ dir

All Rust crates live under the crates/ directory. The workspace automatically considers any crate found therein. If you need to add a new create, please implement it there.

Compiler library crates should be named with the revive- prefix. The crate location doesn’t need the prefix.

Dependencies

Dependencies should be added as workspace dependencies. Try to avoid pinning dependencies whenever possible. If possible, add dev dependencies as dev-dependencies only.

Please do always include the Cargo.lock dependency lock file with your PR. Please don’t run cargo update together with other changes (it is preferred to update the lock file in a dedicated dependency update PR).

Contribution rules

  1. Changes must be submitted via a pull request (PR) to the github upstream repository.
  2. Ensure that your branch passes make test locally when submitting a pull request.
  3. A PR must not be merged until CI fully passes. Exceptions can be made (for example to fix CI issues itself).
  4. No force pushes to the main branch and open PR branches.
  5. Maintainers can request changes or deny contributions at their own discretion.

Style guide

We require the official Rust formatter and clippy linter. In addition to that, please also consider the following best-effort aspects:

  • Avoid magic numbers and strings. Instead, add them as module constants.
  • Avoid abbreviated variable and function names. Always provide meaningful and readable symbols.
  • Don’t write macros and don’t use third party macros for things that can easily be expressed in few lines of code or outlined into functions.
  • Avoid import aliasing. Please use the parent or fully qualified path for conflicting symbols.
  • Any inline comments must provide additional semantic meaning, explain counter-intuitive behavior or highlight non-obvious design decisions. In other words, try to make the code expressive enough to a degree it doesn’t need comments expressing the same thing again in the English language. Delete such comments if your AI assistant generated them.
  • Public items must have a meaningful doc comment.
  • Provide meaningful panic messages to .expect() or just use .unwrap().

AI policy

Contributors may use whatever AI assistance tools they wish to whatever degree they wish in the process of creating their contribution, given they acknowledge the following:

Project maintainers may reject any contribution (or portions of it) if the contribution shows signs of problematic involvement of generative AI.

Judgement of “problematic involvement” lies at the sole discretion of project maintainers. No proof (whether a contribution was in fact AI generated or not) is required. Rationale:

  • No one enjoys reading soulless and uncanny LLM slop. Please review and fix any AI slop yourself prior to submitting a PR.
  • A Solidity compiler is security sensitive software. Even miniscule mistakes can ultimately lead to loss of funds. AI models are inherently stochastic. They regurarly fail to capture important nuances or produce straight hallucinations. Code that was “blindly” generated has no home here.
  • revive is a large codebase. Generative AI assistants may not have enough “context window” to sufficiently capture correctness, consistency and style aspects of the codebase. We’d like to keep this codebase maintainable by humans for the forseeable future.

Compiler architecture and internals

revive relies on solc, the Ethereum Solidity compiler, as the Solidity frontend to process smart contracts written in Solidity. LLVM, a popular and powerful compiler framework, is used as the compiler backend and does the heavy lifting in terms of optimizitations and RISC-V code generation.

revive mainly takes care of lowering the Yul intermediate representation (IR) produced by solc to LLVM IR. This approach provides a good balance between maintaining a high level of Ethereum compatibility, good contract performance and feasible engineering efforts.

resolc

resolc is the overarching compiler driver library and binary.

When compiling a Solidity source file with resolc, the following steps happen under the hood:

  1. solc is used to lower the Solidity source code into YUL intermediate representation.
  2. revive lowers the YUL IR into LLVM IR.
  3. LLVM optimizes the code and emits a RISC-V ELF shared object (through LLD).
  4. The PolkaVM linker finally links the ELF shared object into a PolkaVM blob.

This compilation process can be visualized as follows:

Architecture Overview

Reproducible contract builds

Because on-chain contract code is identified via its code blob hash, it is crucial to maintain reproducible contract builds. A given compiler version must reproduce the contract build exactly on every target platform resolc supports via the official binary releases.

To ensure this, we employ the following measures:

  • The code generation must be fully deterministic. For example iterating over standard HashMap invalidates this due to its internal state, making it an invalid operation in revive. To circumvent that, a BTreeMap can be used instead.
  • We release fully statically linked resolc binaries. This prevents dynamic linking of potentially differentiating libraries.
  • The only non-bundled dependency is the solc compiler. This is considered fine because the same properties apply to solc.

The revive compiler libraries

The main compiler logic is implemented in the revive-yul and revive-llvm-context crates.

The Yul library implements a lexer and parser and lowers the resulting tree into LLVM IR. It does so by emitting LL using the LLVM builder and our own revive-llvm-context compiler context crate. The revive LLVM context crate encapsulates code generation logic (decoupled from the parser).

The Yul library also implements a simple visitor interface (see visitor.rs). If you want to work with the AST, it is strongly recommended to implement visitors. The LLVM code generation is implemented using a dedicated trait for historical reasons only.

EVM heap memory

PVM doesn’t offer a similar API. Hence the emitted contract code emulates the linear EVM heap memory using a static byte buffer. Data inside this byte buffer is kept big endian for EVM compatibility reasons (unaligned access is allowed and makes optimizing this non-trivial).

Unlike with the EVM, where heap memory usage is gas metered, our heap size is static (the size is user controllable via a setting flag). The compiler emits bound checks to prevent overflows.

The LLVM dependency

LLVM is a special non Rust dependency. We interface its builder interface via the inkwell wrapper crate.

We use upstream LLVM, but release and use our custom builds. We require the compiler builtins specifically built for the PVM rv64emacb target and always leave assertions on. Furthermore, we need cross builds because resolc itself targets emscripten and musl. The revive-llvm-builer functions as a cross-platform build script and is used to build and release the LLVM dependency.

We also maintain the lld-sys crate for interfacing with LLD. The LLVM linker is used during the compilation process, but we don’t want to distribute another binary.

Custom optimizations

An experimental newyork optimizer introduces a custom IR layer between Yul and LLVM IR to capture optimization opportunities that neither solc nor LLVM can realize on their own. solc optimizes for EVM gas on a 256-bit big-endian stack machine, while LLVM lacks the domain knowledge to understand EVM memory semantics or Solidity patterns. The newyork IR bridges this gap with passes for type narrowing, memory optimization, function deduplication, and more.

The newyork optimizer

The newyork crate (crates/newyork/) introduces an additional intermediate representation (IR) layer between Yul and LLVM IR. It enables domain-specific optimizations that neither solc nor LLVM can perform on their own, because they lack semantic knowledge about the cross-domain compilation from EVM to PolkaVM.

Note

The newyork optimizer is experimental. It is gated behind the --newyork CLI flag or the settings.polkavm.newyork field in standard JSON input, and not yet enabled by default.

Motivation

The EVM and PolkaVM are fundamentally different machines:

PropertyEVMPolkaVM (RISC-V)
Word size256-bit64-bit
EndiannessBig-endianLittle-endian
ArchitectureStack machineRegister-based
Memory modelLinear with free pointer conventionFlat address space

solc optimizes Yul IR for EVM gas costs on a 256-bit big-endian stack machine. LLVM, on the other hand, operates at too low a level to understand EVM memory semantics or Solidity patterns. By the time Yul reaches LLVM IR, the high-level intent is lost.

The newyork IR sits between these two worlds and recovers enough semantic information to make optimization decisions that neither compiler can make alone.

Pipeline overview

                    ┌──────────────────────────────────────────────────┐
Yul AST ──────────► │                  newyork IR                      │ ──► LLVM IR ──► RISC-V
            from_yul│                                                  │ to_llvm
                    │  1. inline                                       │
                    │  2. simplify (pass 1)                            │
                    │  3. dedup (exact + fuzzy)                        │
                    │  4. mem_opt + fmp_prop + keccak_fold             │
                    │  5. simplify (pass 2)                            │
                    │  6. mapping_access_outlining + guard_narrow      │
                    │  7. simplify (pass 3)                            │
                    │  8. dedup (exact + fuzzy, pass 2)                │
                    │  ── recursive on subobjects ──                   │
                    │  9. type_inference (iterative narrowing)         │
                    │ 10. late inline loop: inline, simplify, outline, │
                    │     guard-narrow, simplify, dedup, narrow        │
                    │ 11. heap_opt (analysis)                          │
                    │ 12. validate                                     │
                    └──────────────────────────────────────────────────┘

The optimizer runs the following passes in order:

  1. Inlining – custom heuristics tuned for PolkaVM call overhead, with Tarjan SCC-based recursion detection and quadratic leave-overhead modeling.
  2. Simplify (pass 1) – constant folding, algebraic identities, strength reduction (mul by power-of-2 to shl), copy propagation, dead code elimination, environment read CSE (callvalue, caller, origin, etc.), and revert pattern outlining (panic selectors, custom error selectors).
  3. Function deduplication – exact structural match, then fuzzy dedup (functions differing only in literal constants are parameterized and merged, up to 4 differing positions).
  4. Memory optimization – load-after-store elimination, keccak256 fusion (mstore + keccak256 sequences into Keccak256Single/Keccak256Pair nodes), free memory pointer propagation (replaces mload(0x40) with a known constant), and constant keccak256 folding (precomputes hashes of compile-time-constant inputs).
  5. Simplify (pass 2) – cleans up dead code and new constant expressions exposed by memory optimization and keccak folding.
  6. Compound outlining – detects keccak256_pair + sload/sstore sequences and fuses them into MappingSLoad/MappingSStore IR nodes, eliminating intermediate hash values. Guard narrowing – detects if gt(val, MASK) { revert } and iszero(eq(val, and(val, MASK))) patterns and inserts AND-mask narrowing, giving type inference proof that values fit in fewer bits.
  7. Simplify (pass 3) – propagates opportunities created by compound outlining and guard narrowing.
  8. Function deduplication (pass 2) – catches new duplicates exposed by guard narrowing and compound outlining canonicalization.
  9. Type inference – narrows 256-bit values to smaller widths (I1, I8, I32, I64, I128, I160) where provable. Runs iteratively for up to 4 cascading refinement rounds, combining forward min-width propagation, backward use-context demands, transparent-operation demand propagation, and interprocedural parameter/return narrowing.
  10. Late inline loop – now that narrowing and simplification have shrunk wrapper functions below the inline thresholds, re-runs inlining, simplification, mapping access outlining, guard narrowing, deduplication, and type inference to collect the residual opportunities.
  11. Heap analysis – analyzes memory access patterns (alignment, static offsets, taintedness, escaping regions) to determine which accesses can use native little-endian layout, skipping byte-swap operations. Uses GCD-based alignment propagation and per-region taint tracking.
  12. Validation – checks SSA well-formedness (use-before-def, multiple definitions), yield count consistency, and function reference correctness.

Steps 1-8 run recursively on subobjects (deployed contract code), where optimization impact is greatest. Steps 9-12 run on the full object tree.

IR design

The newyork IR is an SSA form with structured control flow, inspired by MLIR’s SCF dialect. Key design choices:

  • Explicit types with address spaces: Every value carries a bit-width (I1, I8, I32, I64, I128, I160, I256) and pointers carry address space information (Heap, Stack, Storage, Code). All values start as I256 and are narrowed by type inference.
  • Pure expressions vs. effectful statements: Expressions compute values without side effects; statements perform memory, storage, or control flow effects. This separation simplifies analysis and rewriting.
  • Semantic annotations: Memory operations are tagged with region information (Scratch, FreePointerSlot, Dynamic). Storage operations carry static slot values when known at compile time.
  • Structured control flow: If, Switch, and For nodes preserve the high-level structure from Yul, with explicit region arguments and yields for value flow across control edges.

For per-operation detail — printed syntax, operand and result types, and more — see the newyork IR reference.

Key optimizations explained

Type narrowing

EVM operates on 256-bit words, but most values in practice fit in 32 or 64 bits. The type inference pass performs bidirectional analysis:

  • Forward: computes minimum width from literal values and operation semantics (e.g., add(I64, I8) produces I65, rounded up to I128).
  • Backward use tracking: classifies each value’s uses into 9 context categories (MemoryOffset, MemoryValue, StorageAccess, Comparison, Arithmetic, FunctionArg, FunctionReturn, ExternalCall, General). All categories conservatively demand the full I256 width by default; the categorization is what enables the interprocedural phase to selectively relax the demand for narrowed function arguments. Earlier versions narrowed directly from the use category, but that was unsound for memory offsets — mload(2^128) aliased to mload(0) because the bounds check ran on an already-truncated value (commit ccca38df).
  • Transparent demand propagation: for modular-arithmetic operations (Add, Sub, Mul, And, Or, Xor), propagates narrow demands backward through operands, exploiting the property that trunc(op(a,b), N) == op(trunc(a,N), trunc(b,N)).
  • Interprocedural: iteratively narrows function parameter and return types in up to four rounds, combining four narrowing strategies — body-driven parameter narrowing, caller-driven parameter narrowing, forward-based return narrowing, and demand-based return narrowing — and re-running full inference between rounds. Parameters are clamped to at least I32 (XLEN on PolkaVM).

This allows LLVM to emit native 32/64-bit instructions instead of software-emulated 256-bit arithmetic, and eliminates expensive multi-instruction comparison sequences (16-20 RISC-V instructions for i256 comparisons reduced to 1-2 for i64).

Guard narrowing

Solidity emits runtime guards that prove values fit in narrow ranges (e.g., address validation via if gt(val, 2^160-1) { revert }). The guard narrowing pass detects these patterns and inserts explicit AND-mask narrowing after the guard. This gives downstream type inference proof that the value fits in fewer bits, enabling cascading narrowing of comparisons, arithmetic, and memory operations that use the guarded value.

Two pattern families are recognized:

  • GT-based guards: if gt(val, MASK) { <terminates> } where MASK is a boundary value like 2^N - 1
  • EQ-based guards: iszero(eq(val, and(val, MASK))) patterns common in Solidity’s address validation

Heap optimization

PVM doesn’t provide EVM-compatible linear memory, so the compiler emulates it using a byte buffer with byte-swap operations for big-endian compatibility. The heap analysis pass determines which memory accesses can use native little-endian layout by analyzing access patterns:

  • Tracks alignment and static offset information for all memory accesses using GCD-based propagation
  • Propagates taintedness when addresses escape to external calls, are written by external sources (codecopy, calldatacopy), or use unaligned access patterns
  • Tracks variable-accessed offsets to prevent mode mismatches between native and byte-swap accesses to the same location
  • Handles loop-carried variables conservatively (marked as non-literal to prevent false constant propagation)

The codegen backend supports four memory access modes: AllNative (all accesses skip byte-swap), InlineNative (constant-offset accesses use native layout), InlineByteSwap (constant-offset accesses use inline byte-swap), and ByteSwap (standard byte-swap through helper functions).

Free memory pointer range proof

The Solidity free memory pointer (mload(0x40)) always fits in 32 bits — sbrk enforces FMP < heap_size on every store, regardless of which memory mode the contract uses. After every literal mload(0x40), codegen emits a trunc N → zext 256 chain (where N is bits(heap_size - 1), e.g. 17 for the 131,072-byte default heap). The trunc-extend round-trip is a no-op semantically, but exposes the bound to LLVM’s IPSCCP range analysis, which then propagates it through every add(fmp, K) and eliminates the trailing safe_truncate_int_to_xlen overflow checks at every FMP-derived offset use. Despite only affecting a single codegen site, this is the single largest contributor to the optimizer’s code-size reduction.

A subtle gating issue: the byte-order mode (InlineNative / ByteSwap) and the value bound on FMP are independent invariants. fmp_native_safe() and can_use_native(0x40) protect against mixing little-endian writers with big-endian readers on the FMP slot, which would corrupt the stored offset; the value bound is unrelated and holds in every mode. Earlier versions of the codegen gated the load-side range proof on the byte-order checks, which suppressed the optimization for any contract with dynamic memory accesses. Decoupling the two reasonings — keeping the byte-order gate on the store side, dropping it from the load-side range proof — is what makes the multiplicative IPSCCP effect available to OZ-class contracts.

Soundness traps for FMP optimizations

The FMP slot is small but easy to mis-optimize. The codebase carries several regression tests for previously-found soundness bugs; new FMP-related changes should be verified against them:

  • mload_at_fmp_slot (crates/integration/src/tests.rs, fixed in 1fd6063c): tests mload(0x40) and offsets near it (0x21, 0x3f, 0x42) on a contract that also performs dynamic mloads. Catches byte-order mismatches when one access goes native (LE) and another goes byte-swap (BE). The fix blocks native mode for FMP whenever has_dynamic_accesses is true.
  • mload_huge_offset_traps (fixed in ccca38df): tests that mload(2^128) and mload(2^255) correctly trap via the gas-exhaustion path. Catches UseContext::MemoryOffset narrowing bypassing the safe_truncate_int_to_xlen overflow check at the use site — mload(2^128) aliasing to mload(0) and returning the zero-initialized scratch slot. The fix classifies MemoryOffset as I256 so it doesn’t drive narrowing; the bounds check at the use site catches out-of-range.
  • FMP i32 shortcut removal (dbcfc921): an earlier optimization stored only 4 bytes at offset 0x40 instead of the full 32-byte EVM word, breaking any inline assembly using mstore(0x40, ...) for non-FMP purposes. Caused a cascade of 249/251 retester failures via allocator corruption. No dedicated regression test was added — the retester corpus was sufficient coverage — but the lesson generalizes: writes to 0x40 must store the full word, even when the high bits are provably zero, because the slot is part of the same 32-byte memory region read by other code.

When adding an optimization that touches FMP, distinguish carefully between: the byte-order encoding at the slot (must be consistent between writers and readers), the value bound (FMP < heap_size, always true), and the stored width (must be 32 bytes for mstore(0x40, ...), even though only the low N bits are non-zero).

Known limitation: dynamic full-word stores to the FMP word

The fmp_could_be_unbounded analysis flags a static mstore(0x40, untrusted) and any dynamic-offset mstore8, but not a dynamic-offset full-word mstore. Such a store whose i256 offset wraps (mod 2²⁵⁶) to the FMP word [0x40, 0x5f] overwrites the free pointer with an arbitrary value, which the load-side range proof would then truncate — a miscompile.

This is a deliberate, documented gap rather than a bug fix because there is no cheap sound discriminator. A store hits the FMP word iff its offset lands in [0x40, 0x5f], which is in-bounds — safe_truncate_int_to_xlen only traps offsets ≥ heap_size — and 256-bit wrap lets any computed add(base, k) reach 0x40 with an adversarial operand, so the offset cannot be proven to miss the slot from width/range information. The only sound recognizer (treat add(fmp, small_const) as ≥ 0x80 by induction on FMP-boundedness) needs new FMP-derivation dataflow, is fragile, and still misses dynamic-index array stores. Conservatively flagging every dynamic full-word store (as the rare dynamic mstore8 does, where it is free) disables the FMP range proof for essentially every contract — measured at roughly +9% / +30 KB on the OpenZeppelin corpus.

The gap is unreachable from solc output: solc’s dynamic memory stores are all free-pointer-relative (≥ 0x80) and never target 0x40. Only hand-written Yul (resolc --yul) with an offset engineered to equal 0x40 reaches it.

Keccak256 fusion and folding

Two complementary optimizations target the common Solidity pattern of hashing values for storage slot computation:

  • Fusion: Recognizes mstore + keccak256 sequences and fuses them into dedicated IR nodes (Keccak256Single, Keccak256Pair), eliminating intermediate memory traffic.
  • Constant folding: When all keccak256 inputs are compile-time constants, the hash is computed at compile time and replaced with a literal.

Known limitation: constant-folding drops the fused-keccak scratch write-back

The fused Keccak256Pair/Keccak256Single helpers write their inputs back to scratch memory ([0, 0x40) / [0, 0x20)), and fusion dead-eliminates the original mstores because that write-back reproduces them. Constant-folding the fused node to a literal removes the helper, so the scratch is left unwritten — a later mload from [0, 0x40) that the optimizer cannot forward (across a region/call boundary) would read stale memory.

This gap is deliberately left open: it is solc-unreachable (solc treats scratch as volatile and never re-reads it as data after a keccak), and every sound fix is a code-size regression because the dropped write-back means the current output is already short the stores (disabling the fold falls back to the runtime keccak helper, +0.78% on the OZ corpus, with no later mem_opt pass to clean re-emitted writes). Only hand-written Yul that reads scratch after a constant-operand keccak across a boundary can observe it.

Compound outlining (mapping access)

Solidity mapping accesses follow a predictable pattern: hash a key with a storage slot, then load/store the result. The compound outlining pass detects keccak256_pair(key, slot) followed by sload/sstore and fuses them into MappingSLoad/MappingSStore IR nodes. These are lowered to outlined helper functions (__revive_mapping_sload, __revive_mapping_sstore) that combine the hash computation with the storage operation, eliminating intermediate values and redundant byte-swaps.

Fuzzy function deduplication

Solidity generates many near-identical functions that differ only in literal constants (e.g., error selectors, storage slot offsets). Fuzzy deduplication identifies such groups, parameterizes the differing literals (up to 4 positions), and replaces all copies with calls to a single shared implementation.

Revert pattern outlining

The simplify pass detects common revert patterns and replaces them with compact IR nodes:

  • Panic reverts: Solidity Panic(uint256) sequences (selector 0x4e487b71 + encoded panic code) are collapsed into PanicRevert { code } nodes, which are lowered to shared helper functions.
  • Custom error reverts: ABI-encoded custom error reverts with known selectors are collapsed into CustomErrorRevert { selector, args } nodes.

These patterns appear dozens of times in typical contracts, and outlining them into shared blocks eliminates significant code duplication.

Outlined helper functions

The LLVM codegen backend generates approximately 15 types of outlined helper functions for common operations:

  • Storage: __revive_sload_word, __revive_sstore_word (handle byte-swap internally)
  • Mapping: __revive_mapping_sload, __revive_mapping_sstore (keccak256 + storage in one call)
  • Callvalue: __revive_callvalue, __revive_callvalue_nonzero (boolean optimization for non-payable checks)
  • Calldataload: __revive_calldataload (outlined when >= 20 call sites)
  • Memory: __revive_store_bswap, __revive_exit_checked, __revive_return_word
  • Errors: __revive_error_string_revert_N, __revive_custom_error_N (per data-word count)
  • Keccak wrappers: __keccak256_slot_N (one noinline wrapper per constant slot, internally dispatching to __revive_keccak256_two_words)

Additionally, common exit patterns (revert with constant length, zero-value returns) are deduplicated into shared LLVM basic blocks, saving hundreds of instruction copies in large contracts.

Codesize results

Integration test contracts

Reproducible with cargo test --package revive-integration -- codesize for the Via Yul IR column (crates/integration/codesize.json) and cargo test --package revive-integration --features newyork -- codesize for the Via newyork IR column (crates/integration/codesize_newyork.json).

ContractVia Yul IR (bytes)Via newyork IR (bytes)Reduction
Baseline838493−41.2%
Computation2,3681,217−48.6%
DivisionArithmetics11,4447,370−35.6%
ERC2018,0578,726−51.7%
Events1,614909−43.7%
FibonacciIterative1,373969−29.4%
Flipper2,2051,058−52.0%
SHA17,8306,264−20.0%

OpenZeppelin contracts

Measured against real-world contracts generated with the OpenZeppelin Wizard. The numbers below are a development snapshot.

ContractVia newyork IR (bytes)
oz_gov81,840
erc72152,634
erc2045,703
oz_stable45,052
oz_rwa41,581
erc115533,087
oz_simple_erc2017,024
proxy3,748
Total320,669

For comparison, building the same contracts without the newyork optimizer at the equivalent snapshot produced 563,526 bytes total — a reduction of about −43% across the corpus.

Per-contract reductions in the integration suite range from roughly −20% (SHA1, where the bulk of the work is the SHA-1 inner loop and offers little to optimize) to about −52% (Flipper, where the optimizer strips away most of Solidity’s dispatch and storage-access scaffolding).

Development history and challenges

The first version of the newyork optimizer was authored collaboratively and reviewed extensively by the revive maintainers, as well as Claude Opus, Claude Fable, Qwen, Minimax and Deepseek LLMs, over a span of many months — from early February 2026 through mid-June 2026.

The development progressed through several distinct phases:

Phase 1 – Initial scaffolding: The first draft established the core IR data structures, Yul-to-newyork-IR translation, and LLVM codegen. Early commits focused on getting a correct round-trip through the new pipeline.

Phase 2 – Optimization passes: Once the baseline was stable, optimization passes were added iteratively: inlining, simplification, memory optimization, function deduplication, keccak256 fusion, and type inference. Each pass was validated against differential tests comparing EVM and PVM execution.

Phase 3 – Soundness hardening: Several type inference and narrowing approaches turned out to be unsound and had to be reworked:

  • An early type inference approach caused namespace collisions across subobjects and was scoped per-object.
  • Caller-based parameter narrowing was polluted by overly aggressive inference and replaced with body-based structural analysis.
  • Backward demand-driven narrowing required multiple iterations to become provably safe.

Phase 4 – Measuring and tuning: Systematic measurement of OpenZeppelin contracts revealed which optimizations had the most impact and which approaches regressed performance.

Throughout development the optimizer was validated against the existing integration and differential test suites (containing over 30,000 test cases), which run each contract on both EVM and PVM and assert identical state changes.

The newyork compiler pipeline introduced no new regressions over these test suites. This was achieved by careful manual reviews and many LLM bughunt loops. Additionally, a final security review by Anthropic’s Fable 5 LLM found no remaining soundness issues. As with any new compiler feature, it should still be treated as experimental as of now.

Approaches that did not work

ApproachOutcome
Storage bswap decomposition (4x bswap.i64)Regressed: LLVM handles bswap.i256 better natively
NoInline on __revive_int_truncate+62% regression: PolkaVM call overhead exceeds inline cost
Native FMP memory (inline sbrk)Mixed: small contracts improved, large ones regressed from sbrk bloat
Shared overflow trap blockMixed: prevented LLVM from eliminating individual dead overflow checks
Aggressive IR-level single-call inliningRegressed large contracts (ERC20 +6.1%): merged bodies become monolithic functions LLVM can’t optimize, so large functions are deferred to LLVM’s inliner instead
Type-inference narrowing of mload(0x40) to I32Regressed small contracts (+252 bytes): conflicts with the codegen FMP range proof; the bound is exposed via a trunc→zext pair instead
Full simplifier re-run after mem_optMixed: helped small ERC20 (−293 bytes) but regressed the OZ stablecoin (+72 bytes); replaced by a targeted keccak-only fold

These results highlight a recurring theme: interacting well with LLVM’s own optimization passes is critical. Optimizations at the IR level can inadvertently inhibit LLVM’s downstream passes, sometimes causing surprising regressions.

Known limitations and future work

The following opportunities have been identified but are not yet implemented:

  • Memory optimization across loop boundaries: Tracked memory state is cleared around for loop condition, body, and post blocks, so load-after-store eliminations do not carry across loop iterations. Preserving loop-invariant state would recover more eliminations.
  • Adaptive inlining thresholds: Current thresholds are static constants. Profile-guided or contract-size-aware heuristics could improve decisions for diverse contract sizes.
  • Extended fuzzy deduplication: The current pass only parameterizes literals in Let bindings and SStore slots. Extending it to consider literals inside MStore, Return, Revert, and Log statements would find more deduplication opportunities.
  • Type checking in validation: The validator checks SSA well-formedness and structure, but not operation type consistency. Type discipline is maintained by construction (type inference and codegen), with LLVM’s IR verifier as the backstop.
  • Loop variable narrowing: Loop-carried variables are conservatively widened to I256. Reaching a fixed-point across loop iterations could allow narrower types for simple counters.
  • Functions with leave inside a for loop are not inlined: the IR-level inliner defers such functions to LLVM’s inliner, so they miss the interprocedural constant propagation and width narrowing the IR-level pass provides.

Debug output

Passing --debug-output-dir <path> makes the newyork pipeline write IR and analysis artifacts for each compiled contract into that directory. The dumps are produced automatically whenever the directory is set.

FileContent
<file-stem>.newyorkFinal optimized IR, annotated with the inferred type widths
<file-stem>.snapshot.newyorkIR snapshot taken before the late passes (only when captured during translation)
<file-stem>.heap.newyorkHeap analysis summary (native regions/offsets, taintedness, escapes, dynamic accesses)
<file-stem>.mem.newyorkMemory optimization counters (loads/stores eliminated, keccak fusions, FMP loads eliminated)

Module reference

ModulePurpose
lib.rsPipeline orchestration and pass sequencing
ir.rsCore IR data structures (types, expressions, statements, functions, objects)
from_yul.rsYul AST to newyork IR translation (two-pass with forward reference support)
to_llvm.rsnewyork IR to LLVM IR codegen with outlined helpers and narrowing
simplify.rsConstant folding, algebraic identities, strength reduction, copy propagation, DCE, environment read CSE, revert outlining, callvalue hoisting, function deduplication (exact and fuzzy), constant keccak folding
inline.rsFunction inlining with PolkaVM-tuned heuristics (Tarjan SCC, leave elimination)
type_inference.rsBidirectional integer width narrowing with transparent demand propagation
mem_opt.rsLoad-after-store elimination, keccak256 fusion, FMP propagation
heap_opt.rsHeap access pattern analysis, alignment tracking, byte-swap elimination
mapping_access_outlining.rsMapping access pattern detection and fusion (keccak256_pair + sload/sstore)
guard_narrow.rsGuard pattern detection and AND-mask narrowing insertion
validate.rsIR well-formedness checks (SSA, yields, function references)
printer.rsHuman-readable IR pretty printer with configurable output
ssa.rsSSA construction helpers (scope management, phi-node merging)

newyork IR reference

A per-operation reference for the newyork IR: textual syntax, operand and result types, purity, region and static-slot annotations, and examples.

How to read this reference

This reference page enumerates every operation the newyork IR supports. It is a lookup, not a walkthrough: each entry is self-contained and intended to be reachable by anchor.

Operations are grouped by function (memory and storage writes, pure expressions, control flow, and so on) rather than alphabetically. Jump to a specific operation from the operation index below, or use the sidebar.

Every operation appears in two places in the codebase. The canonical Rust definition is a variant of either Expression or Statement in ir.rs. The textual rendering used by debug dumps and by this reference page is produced by the printer in printer.rs.

Note

Treat the printed syntax as a debug surface, not a stable input language: there is no parser for it, and printer details change when passes add new annotations.

Entry format

Each operation entry has the same shape:

FieldWhat it shows
HeadingThe printed operation name (e.g. mstore) followed by the Expression or Statement variant it corresponds to in ir.rs.
DescriptionA short prose summary of what the operation does and any semantic notes worth knowing before reading the rest of the entry.
SyntaxThe literal printer output, including any optional debug annotations (region tags, static-slot comments). Anything inside /* ... */ is a debug-only annotation and is not part of the operation itself.
ExampleA minimal printed snippet, using the printer’s actual v0/v1/… naming.
OperandsOne row per input or structural participant in the printed syntax. Value operands list the narrowest type the operation guarantees (default i256; narrower widths only appear when type inference has narrowed an upstream definition). Vector-of-operands fields show Vec<…> as the type. Non-value participants such as nested regions are listed with an em-dash type to mark them as structural rather than as operands.
Result and purityThe type the operation produces (or none for statements that bind no value), followed by a purity label, either Pure or Effectful. Pure operations may be reordered, deduplicated, or eliminated by the simplifier; effectful ones may not. Effectful entries may carry a parenthetical describing the nature of the side effect when informative (e.g. “control flow”, “terminator”, or a note about revert/trap behavior).
AnnotationsOperation-specific fields the printer surfaces as /* ... */ comments in the dump (region tag for memory ops, static-slot hint for storage ops, type suffix for non-default widths). Listed here as a table of source fieldprinted form.

Syntax notation

Syntax templates in each entry use the following conventions:

NotationMeaning
add, mload, if, else, case, let, yield, …Literal printer tokens: bare lowercase identifiers and keywords that the printer emits verbatim.
$offset, $value, $key, $lhs, $rhs, …Role names ($-prefixed): placeholders for SSA value references the printer renders as v followed by a decimal id (v0, v1, …).
<type>, <region>, <hex>, <id>, <bits>, <func_name>, <N>, <length>, …Metavariables: stand for compile-time fields (type tags, hex values, identifier strings, integer counts), not SSA values. The concrete values they take are enumerated in the Annotations section of each entry or in the type system reference.
[…]Optional parts. Anything inside the brackets may or may not appear in any given dump, depending on the conditions described in the operation’s Annotations section.
[: <type>]Optional type suffix on a value reference. Suppressed when the value’s type is the default i256 integer; present otherwise (: i32, : ptr<heap>, …).
/* … */Debug-only annotations the printer attaches to certain operations (memory region tag, static-slot hint, etc.).
Repetition: “more entries of the same shape.” Used in vector operand lists ($arg_0, $arg_1, …) and in multi-line block bodies ({ … }).

For instance, this template:

let $result[: <type>] := and($lhs[: <type>], $rhs[: <type>])

prints as:

let v2: i8 := and(v0, v1: i8)

$result rendered as v2 with an i8 type suffix, $lhs as v0 at the default i256 (type suffix omitted), and $rhs as v1 with an i8 type suffix.

Note

A value’s printed width is use-driven. Type inference assigns each value a forward width from its definition, then widens it to satisfy its uses. The type suffix shown for a value in an example (such as i8) is therefore only illustrative — a short example may not show the uses that determine it, and the same operation can appear with a wider suffix, or none (it is omitted for the default i256), in another program.

For instance, a value used as a memory offset widens to i64; as an address (a call target, extcodesize) to i160; stored as a full word (an mstore/sstore value) to i256; and an add/mul operand up to the i64 register width.

Operation index

Pure expressions

Constants and variables
Arithmetic
Bit-width conversions
Hashing
Environment reads
Calldata, returndata, and code
Memory and storage loads
Linker
Function call

Memory and storage writes

Bulk copies

Bindings and wrappers

Structured control flow

External interaction

Termination

Type system

Every value in the IR carries a Type. The operation entries below refer to widths (i1i256), address spaces (ptr<heap>, etc.), and memory regions (scratch, etc.) by their printed form; this section is the reference for those names.

Type

The umbrella enum, with these variants:

VariantPrinted asDescription
Int(BitWidth)i1, i8, …, i256An integer at one of the BitWidth widths.
Ptr(AddressSpace)ptr<heap>, ptr<stack>, ptr<storage>, ptr<code>A pointer tagged with its address space; see AddressSpace.
VoidvoidUnit type. Used for statements that produce no value and for void-returning functions.

BitWidth

The rungs of integer width. Newly minted values default to I256; type inference narrows them down to one of the lower rungs when it can prove the upper bits are zero or unused.

VariantPrinted asTypical use
I1i1Boolean. Result type of every comparison and iszero.
I8i8Byte values. The narrowest meaningful integer.
I32i32PolkaVM pointer width (XLEN); minimum width for function parameters under the rv64e ABI.
I64i64PolkaVM native register width; most narrowed values land here.
I128i128Two registers; arithmetic that overflows i64 but doesn’t need full 256-bit emulation.
I160i160Ethereum addresses; result of caller, origin, mapping keys.
I256i256EVM word width. The default and conservative ceiling.

AddressSpace

The address space a pointer points into. Carried on every Ptr value so the codegen can lower loads and stores without a separate alias-analysis pass.

VariantPrinted asPoints intoEndianness
Heapptr<heap>Emulated EVM linear memory (the simulated mload/mstore region).Big-endian (by EVM contract).
Stackptr<stack>Native PolkaVM stack allocations.Little-endian (no swap).
Storageptr<storage>Contract storage; key/value with 256-bit slots.Big-endian on the wire.
Codeptr<code>Read-only code/data segment.Big-endian.

MemoryRegion

A refinement carried by every memory load and store on top of AddressSpace::Heap. The tag tells later passes what kind of heap address an offset is hitting, which drives both free-memory-pointer propagation and byte-swap elimination.

VariantAddress rangePrinted asMeaning
Scratch0x000x3f/* scratch */EVM scratch space; safe to touch without consulting the free memory pointer.
FreePointerSlotexactly 0x40/* free_ptr */Slot that stores the free memory pointer itself.
Dynamic0x80 and above/* dynamic */Real heap allocations.
Unknowneverything else (constants in 0x410x7f, plus all non-constant offsets)(suppressed)Conservative fallback used when the offset isn’t a constant or doesn’t slot cleanly.

Pure expressions

Pure expressions produce values without side effects. The simplifier may freely reorder, deduplicate, and eliminate them. They appear on the right-hand side of a let binding, or as operands of other expressions and effectful statements; the operand positions accept SSA value references only, so any pure expression that is consumed elsewhere is first bound by a let. Examples in this section wrap each expression in a let v := … to give it somewhere to land.

0x<hex>

(Expression::Literal)

Description

A compile-time constant value with a declared type. New literals minted by the translator default to Int(I256); passes that synthesize constants at narrower widths (e.g. a one-bit boolean from a constant comparison) attach the narrower type directly.

Syntax

0x<hex>[: <type>]

Example

let v0: i8 := 0x2a
let v1: i1 := 0x1           // boolean true
let v2: i64 := 0x80

Operands

None — literals are leaves.

Result and purity

ResultPurity
Same as the literal’s value_typePure

Annotations

Source fieldPrinted as
value: BigUint0x<hex> in the syntax position (not a comment annotation; it is the expression itself)
value_type: Type: <type> suffix when value_type is not the default Int(I256); suppressed otherwise

v<id>

(Expression::Var)

Description

A reference to an existing SSA value, used as the entire right-hand side of a let. In a typical dump this is rare because the simplifier collapses let v := v<id> into the consumers of v via copy propagation; expect to see it only in dumps taken before simplification has run.

Syntax

v<id>

Example

let v5 := v3                // copy; usually eliminated by simplify

Operands

None — the expression is the value reference itself.

Result and purity

ResultPurity
Same as the referenced value’s typePure

Annotations

None.

add

(Expression::Binary with BinaryOperation::Add)

Description

Modular addition. Wraps on overflow; per EVM, the result is (lhs + rhs) mod 2^N where N is the operand width.

Syntax

add($lhs[: <type>], $rhs[: <type>])

Example

let v2 := add(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
widen_by_one(max(width(lhs), width(rhs))) — one tier above the wider operand to account for the carry bitPure

Annotations

None.

sub

(Expression::Binary with BinaryOperation::Sub)

Description

Modular subtraction. Wraps on underflow; the result is (lhs - rhs) mod 2^256 regardless of operand widths.

Syntax

sub($lhs[: <type>], $rhs[: <type>])

Example

let v2 := sub(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
i256 — conservative; underflow on narrower operands could borrow into upper bitsPure

Annotations

None.

mul

(Expression::Binary with BinaryOperation::Mul)

Description

Modular multiplication. The result is (lhs * rhs) mod 2^256.

Syntax

mul($lhs[: <type>], $rhs[: <type>])

Example

let v2 := mul(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
double_width(max(width(lhs), width(rhs))) — the tier holding twice the wider operand’s bits (skipping i160 at the i128i256 transition)Pure

Annotations

None.

div

(Expression::Binary with BinaryOperation::Div)

Description

Unsigned integer division. Per EVM, div(x, 0) = 0 (no trap on division by zero).

Syntax

div($lhs[: <type>], $rhs[: <type>])

Example

let v2 := div(v0, v1)

Operands

NameTypeNotes
lhsi256Dividend.
rhsi256Divisor; 0 yields a result of 0, not a trap.

Result and purity

ResultPurity
width(lhs) — the quotient cannot exceed the dividendPure

Annotations

None.

sdiv

(Expression::Binary with BinaryOperation::SDiv)

Description

Signed two’s-complement integer division. Per EVM, sdiv(x, 0) = 0; quotient is truncated toward zero.

Syntax

sdiv($lhs[: <type>], $rhs[: <type>])

Example

let v2 := sdiv(v0, v1)

Operands

NameTypeNotes
lhsi256Dividend, treated as signed.
rhsi256Divisor, treated as signed; 0 yields 0.

Result and purity

ResultPurity
max(width(lhs), width(rhs)) — a negative divisor can push the result to full widthPure

Annotations

None.

mod

(Expression::Binary with BinaryOperation::Mod)

Description

Unsigned modulo. Per EVM, mod(x, 0) = 0.

Syntax

mod($lhs[: <type>], $rhs[: <type>])

Example

let v2 := mod(v0, v1)

Operands

NameTypeNotes
lhsi256Dividend.
rhsi256Divisor; 0 yields 0.

Result and purity

ResultPurity
width(lhs)Pure

Annotations

None.

smod

(Expression::Binary with BinaryOperation::SMod)

Description

Signed modulo. Per EVM, smod(x, 0) = 0; the result takes the sign of the dividend.

Syntax

smod($lhs[: <type>], $rhs[: <type>])

Example

let v2 := smod(v0, v1)

Operands

NameTypeNotes
lhsi256Dividend, treated as signed.
rhsi256Divisor, treated as signed; 0 yields 0.

Result and purity

ResultPurity
width(lhs)Pure

Annotations

None.

exp

(Expression::Binary with BinaryOperation::Exp)

Description

Modular exponentiation: (base ^ exponent) mod 2^256. The most expensive arithmetic opcode in EVM (variable gas cost proportional to the byte length of exponent).

Syntax

exp($base[: <type>], $exponent[: <type>])

Example

let v2 := exp(v0, v1)

Operands

NameTypeNotes
basei256Base.
exponenti256Exponent.

Result and purity

ResultPurity
i256 — conservative; exponentiation can fill any widthPure

Annotations

None.

and

(Expression::Binary with BinaryOperation::And)

Description

Bitwise AND. The common idiom for type narrowing: a constant mask on the right lets forward analysis pick up a tight result width.

Syntax

and($lhs[: <type>], $rhs[: <type>])

Example

let v2 := and(v0, v1)
let v3: i8 := 0xff
let v4: i8 := and(v0, v3: i8)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
min(width(lhs), width(rhs)) — AND can only clear bits, so the result fits in the narrower operandPure

Annotations

None.

or

(Expression::Binary with BinaryOperation::Or)

Description

Bitwise OR.

Syntax

or($lhs[: <type>], $rhs[: <type>])

Example

let v2 := or(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
max(width(lhs), width(rhs))Pure

Annotations

None.

xor

(Expression::Binary with BinaryOperation::Xor)

Description

Bitwise XOR.

Syntax

xor($lhs[: <type>], $rhs[: <type>])

Example

let v2 := xor(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
max(width(lhs), width(rhs))Pure

Annotations

None.

shl

(Expression::Binary with BinaryOperation::Shl)

Description

Logical left shift. Operand order follows EVM: shl(shift, value) computes value << shift. Shifts ≥ 256 produce 0.

Syntax

shl($shift[: <type>], $value[: <type>])

Example

let v2 := shl(v0, v1)

Operands

NameTypeNotes
shifti256Shift amount in bits.
valuei256Value to shift.

Result and purity

ResultPurity
i256 — conservative; bits may shift into any widthPure

Annotations

None.

shr

(Expression::Binary with BinaryOperation::Shr)

Description

Logical right shift. Operand order follows EVM: shr(shift, value) computes value >> shift with zero-fill from the left. Shifts ≥ 256 produce 0.

Syntax

shr($shift[: <type>], $value[: <type>])

Example

let v2 := shr(v0, v1)

Operands

NameTypeNotes
shifti256Shift amount in bits.
valuei256Value to shift.

Result and purity

ResultPurity
If shift is a known constant k: tier holding 256 - k bits (or i1 for k ≥ 256). Otherwise: width(value).Pure

Annotations

None.

sar

(Expression::Binary with BinaryOperation::Sar)

Description

Arithmetic (signed) right shift. Operand order follows EVM: sar(shift, value) shifts value right by shift bits, preserving the sign bit. Shifts ≥ 256 saturate to 0 for non-negative values and to -1 (all-ones) for negative values.

Syntax

sar($shift[: <type>], $value[: <type>])

Example

let v2 := sar(v0, v1)

Operands

NameTypeNotes
shifti256Shift amount in bits.
valuei256Value to shift, treated as signed.

Result and purity

ResultPurity
width(value) — unlike shr, sign-extension means a constant shift cannot narrow the resultPure

Annotations

None.

lt

(Expression::Binary with BinaryOperation::Lt)

Description

Unsigned less-than comparison. Returns 1 if lhs < rhs, else 0.

Syntax

lt($lhs[: <type>], $rhs[: <type>])

Example

let v2: i1 := lt(v0, v1)

Operands

NameTypeNotes
lhsi256Compared unsigned.
rhsi256Compared unsigned.

Result and purity

ResultPurity
i1Pure

Annotations

None.

gt

(Expression::Binary with BinaryOperation::Gt)

Description

Unsigned greater-than comparison. Returns 1 if lhs > rhs, else 0.

Syntax

gt($lhs[: <type>], $rhs[: <type>])

Example

let v2: i1 := gt(v0, v1)

Operands

NameTypeNotes
lhsi256Compared unsigned.
rhsi256Compared unsigned.

Result and purity

ResultPurity
i1Pure

Annotations

None.

slt

(Expression::Binary with BinaryOperation::Slt)

Description

Signed less-than comparison. Operands are treated as two’s complement.

Syntax

slt($lhs[: <type>], $rhs[: <type>])

Example

let v2: i1 := slt(v0, v1)

Operands

NameTypeNotes
lhsi256Compared signed.
rhsi256Compared signed.

Result and purity

ResultPurity
i1Pure

Annotations

None.

sgt

(Expression::Binary with BinaryOperation::Sgt)

Description

Signed greater-than comparison. Operands are treated as two’s complement.

Syntax

sgt($lhs[: <type>], $rhs[: <type>])

Example

let v2: i1 := sgt(v0, v1)

Operands

NameTypeNotes
lhsi256Compared signed.
rhsi256Compared signed.

Result and purity

ResultPurity
i1Pure

Annotations

None.

eq

(Expression::Binary with BinaryOperation::Eq)

Description

Equality comparison. Returns 1 if lhs == rhs, else 0. Signedness is irrelevant.

Syntax

eq($lhs[: <type>], $rhs[: <type>])

Example

let v2: i1 := eq(v0, v1)

Operands

NameTypeNotes
lhsi256
rhsi256

Result and purity

ResultPurity
i1Pure

Annotations

None.

byte

(Expression::Binary with BinaryOperation::Byte)

Description

Extract a single byte from a 256-bit word. byte(i, x) returns the i-th byte of x with byte 0 being the most significant. If i ≥ 32, the result is 0.

Syntax

byte($index[: <type>], $word[: <type>])

Example

let v2: i8 := byte(v0, v1)

Operands

NameTypeNotes
indexi256Byte position; 0 = most significant byte. Values ≥ 32 yield 0.
wordi256Source word.

Result and purity

ResultPurity
i8Pure

Annotations

None.

signextend

(Expression::Binary with BinaryOperation::SignExtend)

Description

Sign-extend an integer from a byte position. Per EVM, signextend(b, x) treats byte b of x as the most significant byte of a smaller signed integer and extends its sign through the upper bytes.

Syntax

signextend($byte_position[: <type>], $value[: <type>])

Example

let v2 := signextend(v0, v1)

Operands

NameTypeNotes
byte_positioni256Byte position of the sign byte (0–31).
valuei256Source value.

Result and purity

ResultPurity
i256 — the extended value occupies the full wordPure

Annotations

The width-targeted sign-extension primitive sext<i<bits>> (Expression::SignExtendTo) is a separate operation; see the bit-width conversions section.

addmod

(Expression::Ternary with BinaryOperation::AddMod)

Description

Ternary modular addition: (a + b) mod n, computed without intermediate overflow. Per EVM, n = 0 yields 0.

Syntax

addmod($a[: <type>], $b[: <type>], $n[: <type>])

Example

let v3 := addmod(v0, v1, v2)

Operands

NameTypeNotes
ai256First addend.
bi256Second addend.
ni256Modulus; 0 yields 0.

Result and purity

ResultPurity
i256 — conservativePure

Annotations

None.

mulmod

(Expression::Ternary with BinaryOperation::MulMod)

Description

Ternary modular multiplication: (a * b) mod n, computed without intermediate overflow. Per EVM, n = 0 yields 0.

Syntax

mulmod($a[: <type>], $b[: <type>], $n[: <type>])

Example

let v3 := mulmod(v0, v1, v2)

Operands

NameTypeNotes
ai256First factor.
bi256Second factor.
ni256Modulus; 0 yields 0.

Result and purity

ResultPurity
i256 — conservativePure

Annotations

None.

iszero

(Expression::Unary with UnaryOperation::IsZero)

Description

Returns 1 if the operand is 0, else 0. Also serves as the logical NOT for boolean values.

Syntax

iszero($operand[: <type>])

Example

let v1: i1 := iszero(v0)

Operands

NameTypeNotes
operandi256

Result and purity

ResultPurity
i1Pure

Annotations

None.

not

(Expression::Unary with UnaryOperation::Not)

Description

Bitwise complement. Inverts every bit; equivalent to xor(operand, 2^256 - 1).

Syntax

not($operand[: <type>])

Example

let v1 := not(v0)

Operands

NameTypeNotes
operandi256

Result and purity

ResultPurity
i256 — the complement fills the full word regardless of operand widthPure

Annotations

None.

clz

(Expression::Unary with UnaryOperation::Clz)

Description

Count leading zeros. Returns the number of leading zero bits in the operand, where a value of 0 returns 256 (the full width). Not an EVM opcode; reaches newyork as a Yul builtin (FunctionName::Clz) and is translated directly by the Yul-to-newyork translator.

Syntax

clz($operand[: <type>])

Example

let v1 := clz(v0)

Operands

NameTypeNotes
operandi256

Result and purity

ResultPurity
i256 — in practice the value fits in nine bits (max 256), so type inference often narrows furtherPure

Annotations

None.

truncate<i<bits>>

(Expression::Truncate)

Description

Reinterpret a wider integer as a narrower one by discarding the upper bits. The destination width is carried in the IR’s to: BitWidth field and is rendered inside the angle brackets of the printer mnemonic. Narrowing-only; the source width must be greater than or equal to the destination width.

Syntax

truncate<i<bits>>($value[: <type>])

Example

let v1: i64 := truncate<i64>(v0)
let v2: i8 := truncate<i8>(v1: i64)

Operands

NameTypeNotes
valuei256Source value; must be at least as wide as the destination.

Result and purity

ResultPurity
The destination width from the to fieldPure

Annotations

None. The destination width is part of the operation name, not a debug annotation.

zext<i<bits>>

(Expression::ZeroExtend)

Description

Reinterpret a narrower integer as a wider one by zero-filling the upper bits. The destination width is carried in the IR’s to: BitWidth field. Widening-only.

Syntax

zext<i<bits>>($value[: <type>])

Example

let v1 := zext<i256>(v0: i8)

Operands

NameTypeNotes
valuei256Source value; must be no wider than the destination.

Result and purity

ResultPurity
The destination width from the to fieldPure

Annotations

None.

sext<i<bits>>

(Expression::SignExtendTo)

Description

Reinterpret a narrower signed integer as a wider one by sign-extending the high bit. The destination width is carried in the IR’s to: BitWidth field. Distinct from signextend (Expression::Binary), which is the EVM byte-position primitive; this one specifies the destination width directly and is introduced by passes that produce a sign-extended value at a known target width.

Syntax

sext<i<bits>>($value[: <type>])

Example

let v1 := sext<i256>(v0: i64)

Operands

NameTypeNotes
valuei256Source value; must be no wider than the destination.

Result and purity

ResultPurity
The destination width from the to fieldPure

Annotations

None.

keccak256

(Expression::Keccak256)

Description

Compute the Keccak-256 hash of length bytes of emulated EVM linear memory starting at offset. The general-purpose hashing primitive; the specialized variants below cover the common scratch-space patterns more compactly.

Syntax

keccak256($offset[: <type>], $length[: <type>])

Example

let v2 := keccak256(v0, v1)

Operands

NameTypeNotes
offseti256Byte offset into linear memory; forward analysis widens to at least i64.
lengthi256Length of the region to hash, in bytes; forward analysis widens to at least i64.

Result and purity

ResultPurity
i256Pure — the hash is a deterministic function of the memory contents at evaluation time. Passes that hoist or dedupe must respect intervening memory writes.

Annotations

None.

keccak256_pair

(Expression::Keccak256Pair)

Description

Compound hash of two 256-bit words. Equivalent to mstore(0, word0); mstore(32, word1); keccak256(0, 64) but emitted as a single outlined call after mem_opt’s keccak fusion recognizes the pattern. The mapping-key idiom; see also mapping_sload.

Syntax

keccak256_pair($word0[: <type>], $word1[: <type>])

Example

let v2 := keccak256_pair(v0, v1)

Operands

NameTypeNotes
word0i256First word; the high 32 bytes of the hash input.
word1i256Second word; the low 32 bytes of the hash input.

Result and purity

ResultPurity
i256Pure

Annotations

None.

keccak256_single

(Expression::Keccak256Single)

Description

Compound hash of a single 256-bit word. Equivalent to mstore(0, word0); keccak256(0, 32) but emitted as a single outlined call after mem_opt’s keccak fusion.

Syntax

keccak256_single($word0[: <type>])

Example

let v1 := keccak256_single(v0)

Operands

NameTypeNotes
word0i256The word to hash.

Result and purity

ResultPurity
i256Pure

Annotations

None.

caller

(Expression::Caller)

Description

Address of the immediate caller of the current call frame.

Syntax

caller()

Example

let v0: i160 := caller()

Operands

None.

Result and purity

ResultPurity
i160Pure

Annotations

None.

callvalue

(Expression::CallValue)

Description

Value (wei) attached to the current call.

Syntax

callvalue()

Example

let v0 := callvalue()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

origin

(Expression::Origin)

Description

Address of the original externally owned account that initiated the transaction.

Syntax

origin()

Example

let v0: i160 := origin()

Operands

None.

Result and purity

ResultPurity
i160Pure

Annotations

None.

address

(Expression::Address)

Description

Address of the contract executing the current call frame.

Syntax

address()

Example

let v0: i160 := address()

Operands

None.

Result and purity

ResultPurity
i160Pure

Annotations

None.

chainid

(Expression::ChainId)

Description

Chain identifier of the network the contract is executing on.

Syntax

chainid()

Example

let v0 := chainid()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

gas

(Expression::Gas)

Description

Remaining gas at the point of evaluation. Modeled as a pure expression for IR purposes; in practice it changes between evaluations, so any simplifier that deduplicates pure expressions must respect gas as a barrier.

Syntax

gas()

Example

let v0: i64 := gas()

Operands

None.

Result and purity

ResultPurity
i64Pure (per IR; see Description)

Annotations

None.

msize

(Expression::MSize)

Description

Highest byte offset of emulated EVM linear memory that has been touched, rounded up to the next 32-byte boundary. Unlike gas, classified as side-effectful by the simplifier: unused msize() bindings are not eliminated, because the result depends on the program’s memory-access history and would change if the surrounding statements were reordered.

Syntax

msize()

Example

let v0: i64 := msize()

Operands

None.

Result and purity

ResultPurity
i64Effectful (see Description)

Annotations

None.

coinbase

(Expression::Coinbase)

Description

Address of the block’s coinbase (block author).

Syntax

coinbase()

Example

let v0: i160 := coinbase()

Operands

None.

Result and purity

ResultPurity
i160Pure

Annotations

None.

timestamp

(Expression::Timestamp)

Description

Block timestamp, as a Unix epoch second.

Syntax

timestamp()

Example

let v0: i64 := timestamp()

Operands

None.

Result and purity

ResultPurity
i64Pure

Annotations

None.

number

(Expression::Number)

Description

Current block number.

Syntax

number()

Example

let v0: i64 := number()

Operands

None.

Result and purity

ResultPurity
i64Pure

Annotations

None.

difficulty

(Expression::Difficulty)

Description

Pre-merge block difficulty. On post-merge chains this is the block’s prevrandao value.

Syntax

difficulty()

Example

let v0 := difficulty()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

gaslimit

(Expression::GasLimit)

Description

Block gas limit.

Syntax

gaslimit()

Example

let v0: i64 := gaslimit()

Operands

None.

Result and purity

ResultPurity
i64Pure

Annotations

None.

basefee

(Expression::BaseFee)

Description

Current block’s EIP-1559 base fee per gas.

Syntax

basefee()

Example

let v0 := basefee()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

blobbasefee

(Expression::BlobBaseFee)

Description

Current block’s EIP-4844 blob base fee per gas.

Syntax

blobbasefee()

Example

let v0 := blobbasefee()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

blobhash

(Expression::BlobHash)

Description

Versioned hash of the blob at the given index in the current transaction’s blob list.

Syntax

blobhash($index[: <type>])

Example

let v1 := blobhash(v0)

Operands

NameTypeNotes
indexi256Blob index; forward analysis widens to at least i64.

Result and purity

ResultPurity
i256Pure

Annotations

None.

blockhash

(Expression::BlockHash)

Description

Hash of the block with the given number. Per EVM, valid only for the most recent 256 blocks; outside that range the result is 0.

Syntax

blockhash($number[: <type>])

Example

let v1 := blockhash(v0)

Operands

NameTypeNotes
numberi256Block number; forward analysis widens to i256.

Result and purity

ResultPurity
i256Pure

Annotations

None.

selfbalance

(Expression::SelfBalance)

Description

Balance (in wei) of the contract executing the current call frame. Cheaper than balance(address()).

Syntax

selfbalance()

Example

let v0 := selfbalance()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

gasprice

(Expression::GasPrice)

Description

Effective gas price of the current transaction.

Syntax

gasprice()

Example

let v0 := gasprice()

Operands

None.

Result and purity

ResultPurity
i256Pure

Annotations

None.

calldataload

(Expression::CallDataLoad)

Description

Read 32 bytes from the current call’s calldata at the given offset. Reads past the end of calldata return zero bytes.

Syntax

calldataload($offset[: <type>])

Example

let v1 := calldataload(v0)

Operands

NameTypeNotes
offseti256Byte offset into calldata.

Result and purity

ResultPurity
i256Pure

Annotations

None.

calldatasize

(Expression::CallDataSize)

Description

Length of the current call’s calldata, in bytes.

Syntax

calldatasize()

Example

let v0: i64 := calldatasize()

Operands

None.

Result and purity

ResultPurity
i64Pure

Annotations

None.

returndatasize

(Expression::ReturnDataSize)

Description

Length of the most recently returned data buffer from a sub-call, in bytes. Modeled as pure per IR but reflects the last ExternalCall / Create result; consumers must respect that ordering.

Syntax

returndatasize()

Example

let v0: i64 := returndatasize()

Operands

None.

Result and purity

ResultPurity
i64Pure (per IR; see Description)

Annotations

None.

codesize

(Expression::CodeSize)

Description

Size of the currently executing code, in bytes.

Syntax

codesize()

Example

let v0: i64 := codesize()

Operands

None.

Result and purity

ResultPurity
i64Pure

Annotations

None.

extcodesize

(Expression::ExtCodeSize)

Description

Size of the code deployed at the given address, in bytes. Returns 0 for accounts with no deployed code.

Syntax

extcodesize($address[: <type>])

Example

let v1: i64 := extcodesize(v0: i160)

Operands

NameTypeNotes
addressi256Account address; forward analysis widens to at least i160.

Result and purity

ResultPurity
i64Pure

Annotations

None.

extcodehash

(Expression::ExtCodeHash)

Description

Keccak-256 hash of the code deployed at the given address. Returns 0 for non-existent accounts.

Syntax

extcodehash($address[: <type>])

Example

let v1 := extcodehash(v0: i160)

Operands

NameTypeNotes
addressi256Account address; forward analysis widens to at least i160.

Result and purity

ResultPurity
i256Pure

Annotations

None.

balance

(Expression::Balance)

Description

Balance (in wei) of the given account address. Use selfbalance for the contract executing the current call frame (cheaper).

Syntax

balance($address[: <type>])

Example

let v1 := balance(v0: i160)

Operands

NameTypeNotes
addressi256Account address; forward analysis widens to at least i160.

Result and purity

ResultPurity
i256Pure

Annotations

None.

mload

(Expression::MLoad)

Description

Read a 32-byte word from emulated EVM linear memory at offset. The word is read big-endian per EVM semantics. Pure per IR, but reads after writes return the new value; the memory passes track read/write dependencies separately.

Syntax

mload($offset[: <type>]) [/* <region> */]

Example

let v1 := mload(v0: i64)
let v2: i32 := mload(v3: i64) /* free_ptr */

Operands

NameTypeNotes
offseti256Byte offset into linear memory; forward analysis widens to at least i64.

Result and purity

ResultPurity
i32 when region is FreePointerSlot; i256 otherwisePure (per IR; see Description)

Annotations

Source fieldPrinted as
region: MemoryRegion/* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed)

Same tagging rules as mstore. The region also determines the result width: a load from FreePointerSlot produces an i32 since the FMP fits in a pointer-sized word.

sload

(Expression::SLoad)

Description

Read a 32-byte word from persistent contract storage at the given key. Pure per IR; reads after writes to the same slot return the new value.

Syntax

sload($key[: <type>]) [/* slot: 0x<hex> */]

Example

let v1 := sload(v0)
let v2 := sload(v3) /* slot: 0x0 */

Operands

NameTypeNotes
keyi256Storage slot.

Result and purity

ResultPurity
i256Pure (per IR; see Description)

Annotations

Source fieldPrinted as
static_slot: Option<BigUint>/* slot: 0x<hex> */ when set; suppressed otherwise

Same tagging rules as sstore. The printer renders the annotation whenever the field is Some and the deduplicator’s canonicalizer partitions signatures by slot; no pass currently writes Some(...), however, so in present-day dumps the annotation is dormant.

tload

(Expression::TLoad)

Description

Read a 32-byte word from transient storage at the given key. Transient storage is wiped at the end of the transaction; pair with tstore.

Syntax

tload($key[: <type>])

Example

let v1 := tload(v0)

Operands

NameTypeNotes
keyi256Transient storage slot.

Result and purity

ResultPurity
i256Pure (per IR; see Description)

Annotations

None. The IR does not track a static slot for tload.

mapping_sload

(Expression::MappingSLoad)

Description

Compound load for a Solidity mapping element. Equivalent to mstore(0, key); mstore(32, slot); sload(keccak256(0, 64)) but emitted as a single outlined call after the mapping_access_outlining pass recognizes the pattern (it fuses a keccak256_pair — itself produced by mem_opt’s keccak fusion — followed by an sload whose key has a single consumer). Only valid when the intermediate hash is used exclusively by this load.

Syntax

mapping_sload($key[: <type>], $slot[: <type>])

Example

let v2 := mapping_sload(v0: i160, v1)

Operands

NameTypeNotes
keyi256Mapping key; often narrowed to i160 for address keys.
sloti256The mapping’s declared storage slot.

Result and purity

ResultPurity
i256Pure (per IR; see Description)

Annotations

None. The fused statement’s effective storage slot is the keccak hash of the key and the declared slot, which is never a compile-time constant; no static_slot hint is surfaced.

dataoffset

(Expression::DataOffset)

Description

Offset of a named data segment within the deployed code. The identifier is a string carried in the IR’s id: String field; the linker resolves it to a concrete offset.

Syntax

dataoffset("<id>")

Example

let v0 := dataoffset("MyContract_deployed")

Operands

None — the identifier is a quoted string literal in the syntax position, not an operand.

Result and purity

ResultPurity
i256Pure

Annotations

Source fieldPrinted as
id: StringThe quoted identifier in the syntax position (not a comment annotation; it is the expression itself).

datasize

(Expression::DataSize)

Description

Size of a named data segment within the deployed code, in bytes. The identifier is resolved by the linker.

Syntax

datasize("<id>")

Example

let v0: i64 := datasize("MyContract_deployed")

Operands

None — the identifier is a quoted string literal in the syntax position, not an operand.

Result and purity

ResultPurity
i64Pure

Annotations

Source fieldPrinted as
id: StringThe quoted identifier in the syntax position.

loadimmutable

(Expression::LoadImmutable)

Description

Read the value of a named immutable variable. Immutables are written once during contract construction by SetImmutable and read afterwards via this expression.

Syntax

loadimmutable("<key>")

Example

let v0 := loadimmutable("MyContract.owner")

Operands

None — the key is a quoted string literal in the syntax position.

Result and purity

ResultPurity
i256Pure

Annotations

Source fieldPrinted as
key: StringThe quoted identifier in the syntax position.

linkersymbol

(Expression::LinkerSymbol)

Description

Address of an external library, resolved by the linker. The path encodes the library’s source location and identifier.

Syntax

linkersymbol("<path>")

Example

let v0: i160 := linkersymbol("contracts/Library.sol:L")

Operands

None — the path is a quoted string literal in the syntax position.

Result and purity

ResultPurity
i160Pure

Annotations

Source fieldPrinted as
path: StringThe quoted path in the syntax position.

<func_name>

(Expression::Call; the printer emits func_<id> when no function name is registered)

Description

Internal function call. Invokes a user-defined function declared earlier in the same object; the mnemonic is the function’s Yul-level name, or func_<id> if the printer has no name registered for the FunctionId. Distinct from call and the other EVM call-opcode statements, which cross the contract boundary.

Syntax

<func_name>([$argument_0[: <type>], $argument_1[: <type>], …])

Example

let v3 := abi_decode_uint256(v0, v1, v2)
let v4, v5 := returns_two(v0)           // multi-return via let multi-binding

Operands

NameTypeNotes
argumentsVec<Value>Zero or more argument values, in declaration order; each operand may carry a : <type> suffix.

Result and purity

ResultPurity
One or more values, widths taken from the callee’s declared return types (or the inferred return widths, narrowed via the interprocedural pass). Falls back to i256 when the callee’s returns are unknown to type inference.Effectful — the simplifier treats every call as side-effectful regardless of callee body, so unused call bindings are not DCE’d. The transitive purity of the callee is not tracked at the IR level.

Annotations

Source fieldPrinted as
function: FunctionIdThe callee’s name in the syntax position (or func_<id> if the printer has no name registered).

Memory and storage writes

The operations in this section all modify external state: emulated EVM linear memory, persistent storage, or transient storage. They are statements (not expressions) and they are never pure. Simplification and deduplication never reorder them with respect to each other or with respect to reverts; the memory passes treat them as the side-effect boundary for their analyses.

mstore

(Statement::MStore)

Description

Write a 32-byte word to emulated EVM linear memory at offset. The word is stored big-endian, matching EVM semantics; the codegen handles the byte swap on PolkaVM’s little-endian RISC-V target.

Syntax

mstore($offset[: <type>], $value[: <type>]) [/* <region> */]

Example

mstore(v0, v1)                    // Unknown region; no annotation printed
mstore(v2, v3) /* scratch */      // offset proven to land in 0x00..0x3f
mstore(v4, v5) /* free_ptr */     // offset is exactly 0x40

Operands

NameTypeNotes
offseti256Byte offset into linear memory; forward analysis widens to at least i64.
valuei256The 32-byte word to store. Narrower values are zero-extended at codegen time.

Result and purity

ResultPurity
NoneEffectful

Annotations

Source fieldPrinted as
region: MemoryRegion/* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed)

Assigned at translation time from the constant offset (if any); consumed by mem_opt, FMP propagation, and byte-swap mode selection.

mstore8

(Statement::MStore8)

Description

Write a single byte to emulated EVM linear memory at offset. The low 8 bits of value are stored; the upper bits are ignored. The operation is otherwise identical to mstore: same operand shape, same region tag, same side-effect classification.

Syntax

mstore8($offset[: <type>], $value[: <type>]) [/* <region> */]

Example

mstore8(v0, v1: i8)

Operands

NameTypeNotes
offseti256Byte offset into linear memory; forward analysis widens to at least i64.
valuei256Only the low 8 bits are stored. Often narrowed to i8 by type inference.

Result and purity

ResultPurity
NoneEffectful

Annotations

Source fieldPrinted as
region: MemoryRegion/* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed)

Same tagging rules as mstore. Most mstore8s carry an Unknown region in practice because single-byte writes typically target offsets the translator cannot prove constant.

mcopy

(Statement::MCopy)

Description

Copy length bytes from src to dest within emulated EVM linear memory. The Yul builtin mcopy maps directly onto this statement; unlike mstore, it does not carry a region tag because the source and destination ranges may straddle multiple regions.

Syntax

mcopy($dest[: <type>], $src[: <type>], $length[: <type>])

Example

mcopy(v0, v1, v2)

Operands

NameTypeNotes
desti256Destination byte offset in linear memory.
srci256Source byte offset in linear memory.
lengthi256Number of bytes to copy. Overlapping ranges follow EVM-defined memmove semantics.

Result and purity

ResultPurity
NoneEffectful

Annotations

None. mcopy carries no region tag because the source and destination ranges may straddle multiple regions, and no static-slot hint because the copy is not storage-bound.

sstore

(Statement::SStore)

Description

Write a 32-byte word to persistent contract storage at key. The operation is the durable counterpart of mstore: the value survives across transactions and is observable to subsequent calls to the contract.

Syntax

sstore($key[: <type>], $value[: <type>]) [/* slot: 0x<hex> */]

Example

sstore(v0, v1)
sstore(v2, v3) /* slot: 0x0 */

Operands

NameTypeNotes
keyi256Storage slot. May be a constant slot, a keccak-derived slot for mappings or dynamic arrays, or an arbitrary expression.
valuei256The 256-bit word to store.

Result and purity

ResultPurity
NoneEffectful

Annotations

Source fieldPrinted as
static_slot: Option<BigUint>/* slot: 0x<hex> */ when set; suppressed otherwise

The printer renders the annotation whenever the field is Some, and the deduplicator’s canonicalizer and mapping-fusion analyses consume it as part of the signature. No pass currently writes Some(...), so the annotation is dormant in present-day dumps; when absent, alias and dedup analyses fall back to the conservative “may alias any slot” assumption.

tstore

(Statement::TStore)

Description

Write a 32-byte word to transient storage at key. Transient storage is wiped at the end of the transaction, so tstore is the right primitive for per-transaction bookkeeping (reentrancy guards, cached results) without the gas cost of sstore on EVM. On PolkaVM the transient backing store is provided by pallet-revive.

Syntax

tstore($key[: <type>], $value[: <type>])

Example

tstore(v0, v1)

Operands

NameTypeNotes
keyi256Transient storage slot.
valuei256The 256-bit word to store.

Result and purity

ResultPurity
NoneEffectful

Annotations

None. Unlike sstore, the IR does not track a static slot for tstore: transient storage’s short-lived lifetime makes the slot-aware optimizations less valuable, and the translator does not produce the annotation.

mapping_sstore

(Statement::MappingSStore)

Description

Compound store for a Solidity mapping element. Equivalent to mstore(0, key); mstore(32, slot); sstore(keccak256(0, 64), value) but emitted as a single outlined statement after the mapping_access_outlining pass recognizes the pattern (it fuses a keccak256_pair followed by an sstore whose key has a single consumer). Only valid when the intermediate hash is not observed by any other statement.

Syntax

mapping_sstore($key[: <type>], $slot[: <type>], $value[: <type>])

Example

mapping_sstore(v0, v1, v2)

Operands

NameTypeNotes
keyi256Mapping key. The outlining pass force-widens it to i256, so it always prints at full width, even for address keys.
sloti256The mapping’s declared storage slot. Typically a small constant.
valuei256The value to store at the computed storage location.

Result and purity

ResultPurity
NoneEffectful

Annotations

None. mapping_sstore deliberately drops the static_slot annotation that the original sstore may have carried, because the fused statement’s effective slot is the keccak hash of the key and the declared slot, which is never a compile-time constant.

Bulk copies

Multi-byte memory copies from the EVM-accessible byte sources (code, external code, returndata, embedded data, and calldata) into emulated EVM linear memory. They all take the same shape: a destination memory offset, a source offset, and a length. They are effectful and act as opaque barriers to the memory passes.

codecopy

(Statement::CodeCopy)

Description

Copy length bytes from the currently executing code at offset into emulated EVM linear memory at dest. Reads past the end of code yield zero bytes.

Syntax

codecopy($dest[: <type>], $offset[: <type>], $length[: <type>])

Example

codecopy(v0, v1, v2)

Operands

NameTypeNotes
desti256Destination byte offset in linear memory.
offseti256Source byte offset in the executing code.
lengthi256Number of bytes to copy.

Result and purity

ResultPurity
NoneEffectful

Annotations

None.

extcodecopy

(Statement::ExtCodeCopy)

Description

Copy length bytes from the code at address starting at offset into emulated EVM linear memory at dest. Reads beyond the code yield zero bytes; non-existent accounts yield all zeros.

Syntax

extcodecopy($address[: <type>], $dest[: <type>], $offset[: <type>], $length[: <type>])

Example

extcodecopy(v0: i160, v1, v2, v3)

Operands

NameTypeNotes
addressi256Account whose code to read; forward analysis widens to at least i160.
desti256Destination byte offset in linear memory.
offseti256Source byte offset in the external code.
lengthi256Number of bytes to copy.

Result and purity

ResultPurity
NoneEffectful

Annotations

None.

returndatacopy

(Statement::ReturnDataCopy)

Description

Copy length bytes from the most recent sub-call’s return data starting at offset into emulated EVM linear memory at dest. Per EVM, reads past the return data’s end revert; the memory passes treat this as a potential trap site.

Syntax

returndatacopy($dest[: <type>], $offset[: <type>], $length[: <type>])

Example

returndatacopy(v0, v1, v2)

Operands

NameTypeNotes
desti256Destination byte offset in linear memory.
offseti256Source byte offset in the return-data buffer.
lengthi256Number of bytes to copy.

Result and purity

ResultPurity
NoneEffectful (may revert on out-of-range reads, per EVM)

Annotations

None.

datacopy

(Statement::DataCopy)

Description

Copy length bytes from an embedded data segment starting at offset into emulated EVM linear memory at dest. The source segment is resolved by the linker, typically used to pull constants compiled into the bytecode into runtime memory.

Syntax

datacopy($dest[: <type>], $offset[: <type>], $length[: <type>])

Example

datacopy(v0, v1, v2)

Operands

NameTypeNotes
desti256Destination byte offset in linear memory.
offseti256Source byte offset in the data segment.
lengthi256Number of bytes to copy.

Result and purity

ResultPurity
NoneEffectful

Annotations

None.

calldatacopy

(Statement::CallDataCopy)

Description

Copy length bytes from the current call’s calldata starting at offset into emulated EVM linear memory at dest. Reads past the end of calldata yield zero bytes.

Syntax

calldatacopy($dest[: <type>], $offset[: <type>], $length[: <type>])

Example

calldatacopy(v0, v1, v2)

Operands

NameTypeNotes
desti256Destination byte offset in linear memory.
offseti256Source byte offset in calldata.
lengthi256Number of bytes to copy.

Result and purity

ResultPurity
NoneEffectful

Annotations

None.

Bindings and wrappers

The statements that bind SSA values, hold loose expressions evaluated for their side effects, and write to immutable storage. Every pure expression in this reference’s earlier sections appears on the right-hand side of one of these statements (almost always let).

let

(Statement::Let)

Description

SSA binding: evaluate an expression and bind its result(s) to a list of fresh value ids. The let statement is the only mechanism by which pure expressions enter the value namespace; every v<id> in a dump was produced by a let (or by a value-yielding control-flow statement or by a parameter at function entry).

Syntax

let $binding_0[, $binding_1, …] := $expression

Example

let v3 := add(v0, v1)
let v4, v5 := if v2 [v0, v1] { … } else { … }   // multi-binding from a value-yielding If

Operands

NameTypeNotes
bindingsVec<ValueId>One or more fresh SSA ids to bind. Most expressions produce one value; control-flow statements may produce several.
valueExpressionThe right-hand side; see any of the Pure expression entries.

Result and purity

ResultPurity
None directly — the bound ids carry the expression’s result(s)Effectful (binding establishment); the right-hand side’s purity is independent

Annotations

None.

Expression statement

(Statement::Expression)

Description

Wraps an expression evaluated for its observable consequences but whose value is not bound. Typically a zero-return (void) user-defined function call (Expression::Call) evaluated for its side effects, or the discarded void result of a Yul builtin used as a statement (a value-producing expression is bound by a let instead). EVM external calls (call, delegatecall, etc.) and contract creation (create, create2) translate to dedicated Statement::ExternalCall and Statement::Create variants, not through this wrapper.

Syntax

$expression

Example

update_balance(v0)          // void function called for its side effects

Operands

NameTypeNotes
expressionExpressionAny expression; result is discarded.

Result and purity

ResultPurity
NoneEffectful (per its statement position)

Annotations

None.

setimmutable

(Statement::SetImmutable)

Description

Write an immutable variable during contract construction. Immutables are written once in the constructor and read later via loadimmutable. The key is a string identifier resolved by the linker.

Syntax

setimmutable("<key>", $value[: <type>])

Example

setimmutable("MyContract.owner", v0)

Operands

NameTypeNotes
valuei256The value to store; the key is a quoted string literal in the syntax position.

Result and purity

ResultPurity
NoneEffectful

Annotations

Source fieldPrinted as
key: StringThe quoted identifier in the syntax position.

Structured control flow

The IR’s control flow is structured: if, switch, and for are statements with explicit nested regions, each carrying input values and yielding output values. The jump-like statements (break, continue, leave) are scoped to their nearest enclosing construct. Nested blocks create lexical scope without otherwise changing control flow.

if

(Statement::If)

Description

Conditional execution with optional value yields. The then region runs when condition is non-zero; the else region runs otherwise. If outputs is non-empty, both regions must yield the same number of values and the statement is bound by a let.

Syntax

if $condition[: <type>] [[$input_0, $input_1, …]] { … } [else { … }]

Example

if v0: i1 {
    sstore(v1, v2)
}

let v5, v6 := if v3: i1 [v1, v2] {
    let v7: i64 := 0x1          // add widens its operands to the i64 register width
    let v8 := add(v2, v7: i64)
    yield v1, v8
} else {
    yield v1, v2
}

Operands

NameTypeNotes
conditioni256Branch selector; non-zero takes the then region. Often narrowed to i1.
inputsVec<Value>Values threaded into both regions, printed in square brackets after the condition.
(regions)The then_region is mandatory; the else_region is optional and, when absent, implicitly yields the inputs unchanged.

Result and purity

ResultPurity
None for the statement form; for the value-yielding form, one value per outputs binding, types taken from the yielded valuesEffectful (control flow)

Annotations

None.

switch

(Statement::Switch)

Description

Multi-way dispatch on a scrutinee value. Each case matches a specific constant and runs its region; an optional default region catches non-matching values. Like if, switch may yield values via outputs and accept thread-through values via inputs.

Syntax

switch $scrutinee[: <type>] [[$input_0, …]]
case 0x<hex> {
    …
}
[case 0x<hex> {
    …
} …]
[default {
    …
}]

Example

switch v0
case 0x0 {
    sstore(v1, v2)
}
case 0x1 {
    sstore(v1, v3)
}
default {
    invalid()
}

Operands

NameTypeNotes
scrutineei256The value to compare against each case.
inputsVec<Value>Values threaded into every case and default region.
casesVec<SwitchCase>Each case carries a constant value: BigUint and a region.
(default)Optional fall-through region.

Result and purity

ResultPurity
None for the statement form; one value per outputs binding for the value-yielding formEffectful (control flow)

Annotations

None.

for

(Statement::For)

Description

Structured loop with explicit loop-carried variables. Each iteration evaluates condition_statements followed by condition; if the condition is non-zero, the body region runs, then the post region runs, and the loop iterates. Loop-carried variables are passed as SSA values through each region. break exits the loop and continue jumps to the post region.

Syntax

for { $variable_0 := $initial_0[, …] }
    [// condition statements:
        …]
    condition: $condition
    post [($post_input_variable_0[, …])] {
        …
    }
    body {
        … body …
    }

Example

let v0: i1 := 0x0
let v6 := for { v1 := v0: i1 }
    // condition statements:
    let v2: i8 := 0xa
    condition: lt(v1, v2: i8)
    post (v3) {
        let v4: i64 := 0x1
        let v5 := add(v3, v4: i64)
        yield v5
    }
    body {
        sstore(v1, v1)
        0x0: void
        yield v1
    }

Operands

NameTypeNotes
initial_valuesVec<Value>Starting values for the loop-carried variables.
loop_variablesVec<ValueId>SSA ids visible inside condition, body, and post.
condition_statementsVec<Statement>Statements evaluated each iteration before the condition expression; emitted into the loop header block. Printed only when non-empty, behind a // condition statements: comment.
conditionExpressionRe-evaluated each iteration; non-zero continues, zero exits.
bodyRegionLoop body; yields current loop-carried values.
post_input_variablesVec<ValueId>Input SSA ids for the post region (one per loop-carried variable); receive the body’s yielded values merged with continue-site values via phi nodes in the LLVM codegen.
postRegionRuns after each body iteration (and after continue); yields updated loop-carried values.
outputsVec<ValueId>Final loop-carried values after exit.

Result and purity

ResultPurity
None for the statement form; one value per outputs binding for the value-yielding formEffectful (control flow)

Annotations

None.

break

(Statement::Break)

Description

Exit the innermost enclosing for loop. Carries the current values of loop-carried variables at the break point; these become the loop’s outputs.

Syntax

break

Example

if v0 { break [v1, v2] }

Operands

The loop-carried values: Vec<Value> print in brackets when non-empty (e.g. break [v1, v2]).

Result and purity

ResultPurity
NoneEffectful (control flow)

Annotations

None.

continue

(Statement::Continue)

Description

Skip to the post region of the innermost enclosing for loop. Like break, carries the current values of loop-carried variables internally.

Syntax

continue

Example

if v0 { continue [v1, v2] }

Operands

The loop-carried values print in brackets when non-empty (e.g. continue [v1, v2]).

Result and purity

ResultPurity
NoneEffectful (control flow)

Annotations

None.

leave

(Statement::Leave)

Description

Exit the current function, returning the listed values as the function’s return values. The Yul-level leave keyword translates directly to this statement; the inlining pass eliminates intra-function leaves where possible via the exit-flag transformation.

Syntax

leave [[$value_0[: <type>], $value_1[: <type>], …]]

Example

leave [v0, v1]              // returns v0 and v1 from the function
leave                       // returns nothing (void function)

Operands

NameTypeNotes
return_valuesVec<Value>Empty for void functions; otherwise one entry per declared return.

Result and purity

ResultPurity
NoneEffectful (control flow)

Annotations

None.

Nested block

(Statement::Block)

Description

A lexical scope without conditional or iterative behavior. The body is a region; control falls through after the region’s statements complete. Used to bound the visibility of inner bindings.

Syntax

{
    …
}

Example

{
    let v0 := add(v1, v2)
    sstore(v3, v0)
}                           // v0 is no longer in scope here

Operands

None — the body is a region, not an operand.

Result and purity

ResultPurity
NoneEffectful (per the body’s contents)

Annotations

None.

External interaction

Statements that cross the contract boundary: external calls, contract creation, and event log emission. All produce or rely on external state and act as barriers to memory and storage analyses.

call

(Statement::ExternalCall with CallKind::Call)

Description

Standard external call that may transfer value. Reads args_length bytes from emulated EVM linear memory at args_offset as calldata, executes the target, and writes up to ret_length bytes of return data into linear memory at ret_offset. The boolean result indicates success.

Syntax

let $result := call($gas[: <type>], $address[: <type>], $value[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])

Example

let v8 := call(v0: i64, v1: i160, v2, v3: i64, v4: i64, v5: i64, v6: i64)

Operands

NameTypeNotes
gasi256Gas to forward to the target; forward analysis widens to at least i64.
addressi256Callee address; forward analysis widens to at least i160.
valuei256Wei to transfer with the call.
args_offseti256Calldata source offset in linear memory; forward analysis widens to at least i64.
args_lengthi256Calldata length in bytes; forward analysis widens to at least i64.
ret_offseti256Return-data destination offset in linear memory; forward analysis widens to at least i64.
ret_lengthi256Maximum return-data length; forward analysis widens to at least i64.

Result and purity

ResultPurity
i256 (success flag: 1 on success, 0 on revert/error; narrowable to i1)Effectful

Annotations

None.

callcode

(Statement::ExternalCall with CallKind::CallCode)

Description

Deprecated EVM opcode that executes the callee’s code in the caller’s context but with the callee’s storage. Not supported by the newyork backend (codegen rejects it); use delegatecall instead.

Syntax

let $result := callcode($gas[: <type>], $address[: <type>], $value[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])

Example

let v8 := callcode(v0: i64, v1: i160, v2, v3: i64, v4: i64, v5: i64, v6: i64)

Operands

Same shape as call.

Result and purity

ResultPurity
i256 (success flag; narrowable to i1)Effectful

Annotations

None.

delegatecall

(Statement::ExternalCall with CallKind::DelegateCall)

Description

Execute the callee’s code in the caller’s context: same storage, same sender, same call value. The standard mechanism for library calls and proxy patterns. No value operand (the caller’s call value is inherited).

Syntax

let $result := delegatecall($gas[: <type>], $address[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])

Example

let v7 := delegatecall(v0: i64, v1: i160, v2: i64, v3: i64, v4: i64, v5: i64)

Operands

Same shape as call minus the value operand.

Result and purity

ResultPurity
i256 (success flag; narrowable to i1)Effectful

Annotations

None.

staticcall

(Statement::ExternalCall with CallKind::StaticCall)

Description

Read-only external call. Any state modification in the callee (including nested calls) causes the call to revert. No value operand.

Syntax

let $result := staticcall($gas[: <type>], $address[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])

Example

let v7 := staticcall(v0: i64, v1: i160, v2: i64, v3: i64, v4: i64, v5: i64)

Operands

Same shape as call minus the value operand.

Result and purity

ResultPurity
i256 (success flag; narrowable to i1)Effectful (no state writes, but still an external boundary and may revert)

Annotations

None.

create

(Statement::Create with CreateKind::Create)

Description

Deploy a new contract with the given init-code bytes, transferring value wei from the caller. The new contract’s address is derived from the caller’s address and nonce; on failure the result is 0.

Syntax

let $result := create($value[: <type>], $offset[: <type>], $length[: <type>])

Example

let v4 := create(v0, v1: i64, v2: i64)

Operands

NameTypeNotes
valuei256Wei to transfer to the new contract.
offseti256Linear-memory offset of the init code; forward analysis widens to at least i64.
lengthi256Length of the init code in bytes; forward analysis widens to at least i64.

Result and purity

ResultPurity
i256 (created address; narrowable to i160 on success, 0 on failure)Effectful

Annotations

None.

create2

(Statement::Create with CreateKind::Create2)

Description

Deploy a new contract with a deterministic address derived from the caller’s address, the salt, and the init-code hash. Same operand shape as create plus an additional salt.

Syntax

let $result := create2($value[: <type>], $offset[: <type>], $length[: <type>], $salt[: <type>])

Example

let v5 := create2(v0, v1: i64, v2: i64, v3)

Operands

Same as create plus salt: i256.

Result and purity

ResultPurity
i256 (created address; narrowable to i160 on success, 0 on failure)Effectful

Annotations

None.

log<N>

(Statement::Log)

Description

Emit an event log entry. The mnemonic suffix <N> is the number of indexed topics (0 through 4), determined by the length of the IR’s topics field. The data portion is read from length bytes of emulated EVM linear memory at offset.

Syntax

log<N>($offset[: <type>], $length[: <type>][, $topic_0[: <type>], …])

Example

log0(v0: i64, v1: i64)                    // no topics
log2(v0: i64, v1: i64, v2, v3)            // two topics

Operands

NameTypeNotes
offseti256Data source offset in linear memory; forward analysis widens to at least i64.
lengthi256Data length in bytes; forward analysis widens to at least i64.
topicsVec<Value>Zero to four indexed topic values; the length determines the mnemonic suffix.

Result and purity

ResultPurity
NoneEffectful

Annotations

None.

Termination

Statements that end the current call frame. Plain forms (return, revert, stop), unconditional traps (invalid, selfdestruct), and outlined revert variants (panic_revert, error_string_revert, custom_error_revert) that encode common Solidity error patterns into single nodes that can be deduplicated across call sites.

return

(Statement::Return)

Description

End the current call frame successfully, returning length bytes from emulated EVM linear memory at offset as the return data.

Syntax

return($offset[: <type>], $length[: <type>])

Example

return(v0: i64, v1: i64)

Operands

NameTypeNotes
offseti256Return-data source offset; forward analysis widens to at least i64.
lengthi256Return-data length; forward analysis widens to at least i64.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

None.

revert

(Statement::Revert)

Description

End the current call frame with a revert, undoing all state changes made during the call, and returning length bytes of revert data from emulated EVM linear memory at offset.

Syntax

revert($offset[: <type>], $length[: <type>])

Example

revert(v0: i64, v1: i64)

Operands

NameTypeNotes
offseti256Revert-data source offset; forward analysis widens to at least i64.
lengthi256Revert-data length; forward analysis widens to at least i64.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

None.

stop

(Statement::Stop)

Description

End the current call frame successfully with empty return data.

Syntax

stop()

Example

stop()

Operands

None.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

None.

invalid

(Statement::Invalid)

Description

Unconditional invalid-opcode trap. Consumes all remaining gas and reverts. Used for unreachable branches and assertion failures.

Syntax

invalid()

Example

invalid()

Operands

None.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

None.

selfdestruct

(Statement::SelfDestruct)

Description

End the current call frame and transfer the contract’s remaining balance to address. Post-Cancun, the contract storage is not deleted (selfdestruct is effectively deprecated; the opcode still exists for legacy compatibility).

Syntax

selfdestruct($address[: <type>])

Example

selfdestruct(v0: i160)

Operands

NameTypeNotes
addressi256Recipient of the contract’s balance; forward analysis widens to at least i160.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

None.

panic_revert

(Statement::PanicRevert)

Description

Outlined Solidity panic revert. Equivalent to writing the Panic(uint256) ABI encoding (selector 0x4e487b71 plus the panic code) into emulated EVM linear memory and reverting, but emitted as a single statement that lowers to one outlined helper call. Common panic codes: 0x01 assertion failure, 0x11 arithmetic overflow, 0x12 division by zero, 0x32 array-out-of-bounds, 0x41 memory overflow.

Syntax

panic_revert(0x<hex>)

Example

panic_revert(0x11)              // arithmetic overflow

Operands

None — the panic code is stored as a u8 field on the IR, not an SSA operand.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

Source fieldPrinted as
code: u8The panic code in 0x<hex> form (two hex digits, zero-padded).

error_string_revert

(Statement::ErrorStringRevert)

Description

Outlined Solidity Error(string) revert. Equivalent to writing the Error selector (0x08c379a0), the string offset and length, and up to four 32-byte data words into emulated EVM linear memory and reverting. The string length and the data words are stored as compile-time fields; no SSA operands.

Syntax

error_string_revert(<length>, <N>_words)

Example

error_string_revert(12, 1_words)        // 12-byte string in one 32-byte word

Operands

None — the string length and data are compile-time fields, not SSA operands.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

Source fieldPrinted as
length: u8The string length in bytes, in the first syntax position.
data: Vec<BigUint>The number of 32-byte data words (1–4), printed as <N>_words in the second syntax position. The actual data is stored separately and not shown in the printed form.

custom_error_revert

(Statement::CustomErrorRevert)

Description

Outlined Solidity custom-error revert. Encodes the error selector (left-shifted by 224 bits) and zero or more argument values into scratch memory and reverts. No FMP load is needed; the encoding uses the scratch region at offset 0.

Syntax

custom_error_revert(0x<hex>, [$arg_0, $arg_1, …])

Example

custom_error_revert(0xa28c4c11, [v0, v1])

Operands

NameTypeNotes
argumentsVec<Value>Zero or more argument values; the selector is a compile-time field.

Result and purity

ResultPurity
None — terminates the call frameEffectful (terminator)

Annotations

Source fieldPrinted as
selector: BigUintThe 4-byte error selector in hex, in the first syntax position. The selector is stored left-shifted by 224 bits; the printer right-shifts it back and prints the bare 4-byte value.

PVM and the pallet-revive runtime target

The revive compiler targets PolkaVM (PVM) via pallet-revive on Polkadot.

Target CPU configuration

The exact target CPU configuration can be found here.

Note

The PVM linker requires fully relocatable ELF objects.

Why PVM

PVM is a RISC-V based VM designed to overcome the flaws of WebAssebmly (Wasm). Wasm was believed to be a more efficient successor to the rather slow EVM. However, Wasm is far from an ideal target for smart contracts as some of its design decisions are unfavorable for short-lived workloads. The main problem is on-chain Wasm bytecode compilation or interpretation overhead. Prior benchmarks consistently ignoring this overhead seeded the blockchain industry with flawed assumptions: Only when ignoring the startup overhead Wasm is much faster than the slow computing EVM. In practice however, gains are nullified entirely and Wasm loses completely even against very slow VMs like the EVM. Executing Wasm contracts is in fact so inefficient that typical contract workloads are orders of magnitude more expensive than the equivalent EVM variant.

On the other hand, since RISC-V is similar to CPUs found in validator hardware (x86 and ARM), bytecode translation mostly boils down to a linear mapping from one instruction to another. The embedded ISA specification reduces the number of general purpose registers, in turn removing the need for expensive register allocation. This guarantees single-pass O(n) JIT compilation of contract bytecode. The close proximity of PVM bytecode with actual validator CPU bytecode effectively allows to move all expensive compilation workload off-chain. Benchmarks (1, 2) show that with the PVM JIT, sandboxed PVM code executes at around half the speed of native code, which falls into the same ballpark of the state-of-the-art wasmtime Wasm implementation (while EVM sits somewhere around 1/10 to less than 1/100 of native speed). However, the PVM JIT compiler only uses a fraction of the time wasmtime requires to compile the code.

Note

The PVM JIT isn’t available yet in pallet-revive. At the time of writing, the contract code is interpreted, which is orders of magnitude slower than the JIT.

Host environment: pallet-revive

The revive compiler targets the pallet-revive runtime environment.

pallet-revive exposes a syscall like interface for contract interactions with the host environment. This is provided by the revive-runtime-api library.

After the initial launch on the Polkadot Asset Hub blockchain, the runtime API is considered stable and backwards compatible indefinitively.

Testing strategy

Contributors are encouraged to implement some appropriate unit and integration tests together with any bug fixes or new feature implementations. However, when it comes to testing the code generation logic, our testing strategy goes way beyond simple unit and integration tests. This chapter explains how the revive compiler implementation is tested for correctness and how we define correctness.

Tip

Running the integration tests require the evm tool from go-ethereum in your $PATH.

Either install it using your package manager or to build it from source:

git clone https://github.com/ethereum/go-ethereum/
cd go-ethereum
make all
export PATH=/path/to/go-ethereum/build/bin/:$PATH

Bug compatibility with Ethereum Solidity

As a Solidity compiler, we aim to preserve contract code semantics as close as possible to Solidity compiled to EVM with the solc reference implementation. As highlighted in the user guide, due to the underlying target difference, this isn’t always possible. However, wherever it is possible, we follow the philosophy of bug compatibility with the Ethereum contracts stack.

Differential integration tests

A high level of bug compatibility with Ethereum is ensured through differential testing with the Ethereum solc and EVM contracts stack. The revive-integration library is the central integration test utility, providing a set of Solidity integration test cases. Further, it implements differential tests against the reference implementation by combining the revive-runner sandbox, the go-ethereum EVM tool and the revive-differential.

The revive-runner library provides a declarative test specification format. This vastly simplifies writing differential test cases and removes a lot of room for errors in test logic. Example:

{
    "differential": true,
    "actions": [
        {
            "Instantiate": {
                "code": {
                    "Solidity": {
                        "contract": "Bitwise"
                    }
                }
            }
        },
        {
            "Call": {
                "dest": {
                    "Instantiated": 0
                },
                "data": "3fa4f245"
            }
        }
    ]
}

Above example instantiates the Bitwise contract and calls it with some defined calldata. The revive-runner library implements a helper wrapper to execute test specs on the go-ethereum standalone evm tool. This allows the revive-runner to execute specs against the EVM and the pallet-revive runtime. Key to differential testing is setting "differential": true, resulting in the following:

  1. The Bitwise contract is compiled to EVM and PVM code.
  2. The runner executes the defined actions on the EVM and collects all state changes (storage, balance) and execution results.
  3. The runner executes each action on the PVM. Observed state changes after each step as well as the final execution result is asserted to match the EVM counterparts exactly.

Note how we never defined any expected outcome manually. Instead, we simply observe and collect the data defining the “correct” outcome.

Differential testing in combination with declarative test specifications proved to be simple, yet very effective, in ensuring expected Ethereum Solidity semantics on pallet-revive.

The differential testing utility

A lot of nuanced bugs caused by tiny implementation details inside the revive compiler and the pallet-revive runtime could be identified and eliminated early on thanks to the differential testing strategy. Thus, we decided to take this approach further and created a comprehensive test runner and a large suite of more complex test cases.

The Revive Differential Tests follow the exact same strategy but implement a much more powerful test spec format, spec runner and reports. This allows differentially testing much more complex test cases (for example testing Uniswap pair creations and swaps), executed via transactions sent to actual blockchain nodes.

Cross compilation

We cross-compile the resolc.js frontend executable to Wasm for running it in a Node.js or browser environment.

The musl target is used to obtain statically linked ELF binaries for Linux.

Wasm via emscripten

The REVIVE_LLVM_TARGET_PREFIX environment variable is used to control the target environment LLVM dependency. This requires a compatible LLVM build, obtainable via the revive-llvm build script. Example:

# Build the host LLVM dependency with PolkaVM target support
make install-llvm
export LLVM_SYS_221_PREFIX=${PWD}/target-llvm/gnu/target-final

# Build the target LLVM dependency with PolkaVM target support
revive-llvm emsdk
source emsdk/emsdk_env.sh
revive-llvm --target-env emscripten build --llvm-projects lld
export REVIVE_LLVM_TARGET_PREFIX=${PWD}/target-llvm/emscripten/target-final

# Build the resolc frontend executable
make install-wasm
make test-wasm

musl libc

rust-musl-cross is a straightforward way to cross compile Rust to musl. The Dockerfile is an executable example of how to do that.

FAQ

What EVM version do you support?

We neither do nor don’t support any EVM version. We support Solidity versions, starting from solc version 0.8.0 onwards.

Is inline assembly supported

Yes, almost all inline assembly features are supported (see the differences in Yul translation chapter).

Do you support opcode XY?

See above, the same applies.

In what Solidity version should I write my dApp?

We generally recommend to always use the latest supported version to profit from latest bugfixes, features and performance improvements.

Find out about the latest supported version by running resolc --supported-solc-versions or checking here.

Tool XY says the contract size is larger than 24kb and will fail to deploy?

The 24kb code size restriction only exist for the EVM. Our limit is currently around 1mb and may increase further in the future.

Is resolc a drop-in replacement for solc?

No. resolc aims to work similarly to solc, but it’s not considered a drop-in replacement.

Vision and Roadmap

The revive compiler speeds up Solidity contracts significantly. revive provides a decisive edge over other contract platforms. Notably, the compiler eliminates the need of rewriting Solidity dApps in Rust or even as single dApp parachains for scaling reasons. Retaining as high compatibility with Ethereum Solidity as possible keeps entry barriers low.

We believe in Dr. Gavin Wood’s ĐApps: What Web 3.0 Looks Like manifesto and the ecosystem of the Solidity programming language. Our motivation lies in the realization that for a true web3 revolution, significant scaling efforts, like the ones provided by the PVM and this project, are necessary to unfold.

Roadmap

The first major release, resolc v1.0.0, emits functional PVM code from given Solidity sources. It relies on solc and LLVM for optimizations. The main priority of this release was delivering a mostly feature complete and safe Solidity v0.8.0 compiler.

Focus for the second major release is on the custom optimization pipeline, which aims to significantly improve emitted code blob sizes.

The below roadmap gives a rough overview of the project’s development timeline.

Roadmap