Welcome
Hello and a warm welcome to the revive Solidity compiler book!
Warning
Solidity on PVM is running on the
pallet-reviveruntime. This introduces observable semantic differences in comparison with the EVM.Study the differences section carefully. Ignoring these differences may lead to defunct contracts.
Notable examples:
- The 63/64 gas rule isn’t implemented in the pallet (introduces potential DoS vector when calling other contracts)
- Contract instantiation works differently (by hash instead of by code)
- The gas model implemented by
pallet-revivediffers from Ethereum- The heap size is fixed instead of gas-metered and there’s a fixed amount of stack size (contracts working fine on EVM may trap on PVM)
Target audience
- Solidity dApp developers should read the user guide. Solidity on PolkaVM introduces important differences to EVM which should be well understood.
- Contributors will find the developer guide helpful for getting up to speed.
Other Polkadot contracts resources
Head to contracts.polkadot.io for more general information about contracts on Polkadot.
About
This mdBook documents the revive Solidity compiler project. The content is found under book/. Run make book to observe changes.
resolc user guide
resolc is a Solidity v0.8 compiler for Polkadot native smart contracts. Solidity compiled with resolc targets PolaVM (PVM). Thanks to additional compiler optimizations and the PVM JIT, contract code can execute much faster than the EVM equivalent. resolc supports almost all Solidity v0.8 features including inline assembly, offering a high level of comptability with the Ethereum Solidity reference implementation.
revive vs. resolc nomenclature
revive is the name of the overarching “Solidity to PolkaVM” compiler project, which contains multiple components (for example the Yul parser, the code generation library, the resolc executable itself, and many more things).
resolc is the name of the compiler driver executable, combining many revive components in a single and easy to use binary application.
In other words, revive is the whole compiler infrastructure (more like LLVM) and resolc is a user-facing single-entrypoint compiler frontend (more like clang).
Installation
Building Solidity contracts for PolkaVM requires installing the following two compilers:
solc: The Ethereum Solidity reference compiler implementation.resolc: The revive Solidity compiler YUL frontend and PolkaVM code generator.
resolc binary releases
resolc is supported an all major operating systems and installation is straightforward.
Please find our binary releases for the following platforms:
- Linux (MUSL)
- MacOS (universal)
- Windows
- Wasm via emscripten
Installing the solc dependency
resolc uses solc during the compilation process, please refer to the Ethereum Solidity documentation for installation instructions.
revive NPM package
We distribute the revive compiler as node.js module.
Buidling resolc from source
Please follow the build instructions in the revive README.md.
CLI usage
We aim to keep the resolc CLI usage close to solc. There are a few things and options worthwhile to know about in resolc which do not exist in the Ethereum world. This chapter explains those in more detail than the CLI help message.
Tip
For the complete help about CLI options, please see
resolc --help.
LLVM optimization levels
-O, --optimization <OPTIMIZATION>
resolc exposes the optimization level setting for the LLVM backend. The performance and size of compiled contracts varies widely between different optimization levels. (This is independent of --newyork which selects the IR lowering pipeline.)
Valid levels are the following:
0: No optimizations are applied.1: Basic optimizations for execution time.2: Advanced optimizations for execution time.3: Aggressive optimizations for execution time.s: Optimize for code size.z: Aggressively optimize for code size.
By default, -Oz is applied.
newyork IR pipeline
--newyork
Enables the newyork optimizer to reduced compiled contract code size, by routing Yul lowering through the experimental newyork IR pipeline instead of the standard Yul-to-LLVM path. Composes with --yul, --combined-json, and the default Solidity mode. In standard JSON mode this flag is rejected; enable the pipeline via the settings.polkavm.newyork input field instead. Off by default.
Stack size
--stack-size <STACK_SIZE>
PVM is a register machine with a traditional stack memory space for local variables. This controls the total amount of stack space the contract can use.
You are incentivized to keep this value as small as possible:
- Increasing the stack size will increase gas costs due to increased startup costs.
- The stack size contributes to the total memory size a contract can use, which includes the contract’s code size.
Default value: 131072
Warning
If the contract uses more stack memory than configured, it will compile fine but eventually revert execution at runtime!
Heap size
--heap-size <HEAP_SIZE>
Unlike the EVM, due to the lack of dynamic memory metering, PVM contracts emulate the EVM heap memory with a static buffer. Consequentially, instead of infinite memory with exponentially growing gas costs, PVM contracts have a finite amount of memory with constant gas costs available.
You are incentivized to keep this value as small as possible: 1.Increasing the heap size will increase startup costs. 2.The heap size contributes to the total memory size a contract can use, which includes the contract’s code size
Default value: 131072
Warning
If the contract uses more heap memory than configured, it will compile fine but eventually revert execution at runtime!
solc
--solc <SOLC>
Specify the path to the solc executable. By default, the one in ${PATH} is used.
Debug artifacts
--debug-output-dir <DEBUG_OUTPUT_DIRECTORY>
Dump all intermediary compiler artifacts to files in the specified directory. This includes the Yul IR, optimized and unoptimized LLVM IR, the ELF object and the PVM assembly. When the newyork pipeline is active, the newyork IR is additionally dumped (the final IR, a pre-late-pass snapshot, and heap and memory optimization data). Useful for debugging and development purposes.
Debug info
-g
Generate source based debug information in the output code file. Useful for debugging and development purposes and disabled by default.
Deploy time linking
--link [--libraries <LIBRARIES>] <INPUT_FILES>
In Solidity, 3 things can happen with libraries:
- They are not
externally callable and thus can be inlined.- The solc Solidity optimizer inlines those (usually the case). Note:
resolcalways activates the solc Solidity optimizer. - If the solc Solidity optimizer is disabled or for some reason fails to inline them (both rare), they are not inlined and require linking.
- The solc Solidity optimizer inlines those (usually the case). Note:
- They are
externally callable but still linked at compile time. This is the case if at compile time the library address is known (i.e.--librariessupplied in CLI or the corresponding setting in STD JSON input). - They are linked at deploy time. This happens when the compiler does not know the library address (i.e.
--librariesflag is missing or the provided libraries are incomplete, same for STD JSON input). This case is rare because it’s discourage and should never be used by production dApps.
In cases 1.2 and 3:
- Some of the produced code blobs will be in the “unlinked” raw
ELFobject format and not yet deployable. - To make them deployable, they need to be “linked” (done using the
resolc --linklinker mode explained below). - The compiler emitted
DELEGATECALLinstructions to call non-inlined (unlinked) libraries. The contract deployer must make sure to deploy any libraries prior to contract deployment.
Warning
Using deploy time linking is officially discouraged. Mainly due to bytecode hashes changing after the fact. We decided to support it in
resolcregardless, due to popular request.
Similar to how it works in solc, --libraries may be used to provide libraries during linking mode.
Unlike with solc, where linking implies a simple string substitution mechanism, resolc needs to resolve actual missing ELF symbols. This is due to how factory dependencies work in PVM. As a consequence, it isn’t sufficient to just provide the unlinked blobs to the linker. Instead, they must be provided in the exact same directory structure the Solidity source code was found during compile time.
Example:
- The contract
src/foo/bar.sol:Baris involved in deploy time linking. It may be a factory dependency. - The contract blob needs to be provided inside a relative
src/foo/directory to--link. Otherwise symbol resolution may fail.
Note
Tooling is supposed to take care of this. In the future, we may append explicit linkage data to simplify the deploy time linking feature.
JS NPM package
The resolc compiler driver is published as an NPM package under @parity/resolc.
It’s usable from Node.js code or directly from the command line:
npx @parity/resolc@latest --bin crates/integration/contracts/flipper.sol -o /tmp/out
Note
While the npm package makes a nice portable option, it doesn’t expose all options.
Tooling integration
resolc achieved successful integration with a variety of third party developer tools.
Solidity toolkits
Support for resolc is available in forks of the hardhat and foundry Solidity toolkits:
Compiler explorer
resolc is available on godbolt.org for the Solidity and Yul input languages. See also the announcement post on the forum.
Remix IDE
There is remix IDE fork with resolc support at remix.polkadot.io. Unfortunately this is no longer actively maintained (there might be bugs and outdated resolc versions).
Standard JSON interface
The revive compiler is mostly compatible with the solc standard JSON interface. There are a few differences and additional (PVM related) input configurations:
The settings.polkavm object
Used to configure PVM specific compiler settings.
settings.polkavm.debugInformation
A boolean value allowing to enable debug information. Corresponds to resolc -g.
settings.polkavm.newyork
A boolean value allowing to enable the experimental newyork IR pipeline for Yul lowering. Corresponds to resolc --newyork. Off by default.
The output JSON includes which pipeline actually ran, via the top-level resolc_pipeline field ("newyork" or "yul" for the standard Yul-to-LLVM pipeline).
The settings.polkavm.memoryConfig object
Used to apply PVM specific memory configuration settings.
settings.polkavm.memoryConfig.heapSize
A numerical value allowing to configure the contract heap size. Corresponds to resolc --heap-size.
settings.polkavm.memoryConfig.stackSize
A numerical value allowing to configure the contract stack size. Corresponds to resolc --stack-size.
The settings.optimizer object
The settings.optimizer object is augmented with support for PVM specific optimization settings.
settings.optimizer.mode
A single char value to configure the LLVM optimizer settings. Corresponds to resolc -O.
settings.llvmArguments
Allows to specify arbitrary command line arguments to LLVM initialization. Used mainly for development and debugging purposes.
The settings.outputSelection object
Used to select desired outputs.
The “all” (*) wildcard
Resolc supports the “all” (*) wildcard for the file-level (first-level) and contract-level (second-level) keys. A file-level key can be either the wildcard or a specific file name, whereas the contract-level key can only be the wildcard for robustness reasons.
Thus, output can be requested in 2 ways:
// All files and all contracts:
{
"settings": {
"outputSelection": {
"*": {
"*": [/* specific contract-level output fields */],
"": [/* specific file-level output fields */]
}
}
}
}
// Specific files and all their contracts:
{
"settings": {
"outputSelection": {
"path/to/my/file.sol": {
"*": [/* specific contract-level output fields */],
"": [/* specific file-level output fields */]
},
// Rest of files...
}
}
}
The contract-level evm output selection
Note
Currently, resolc supports requesting either the full
evmoutput, or one more level of specificity, such asevm.bytecode.
When requesting code generation, such as evm.bytecode or evm.assembly, the resolc compilation process additionally needs ast, metadata, irOptimized, and evm.methodIdentifiers selectors. These selectors will be automatically added if code generation is needed, but will only be included in the output if explicitly requested.
{
"settings": {
"outputSelection": {
"path/to/my/file1.sol": {
// Contracts in this file will generate bytecode.
// Only these fields of the JSON output selection will be in the `contracts` output.
"*": ["abi", "evm.methodIdentifiers", "metadata", "evm.bytecode"],
// Only this field of the JSON output selection will be in the `sources` output.
"": ["ast"]
},
"path/to/my/file2.sol": {
// No contracts in this file will generate bytecode.
"*": ["abi", "evm.methodIdentifiers", "metadata"],
// No `ast` will be in the `sources` output (only the automatically added `id`,
// similar to solc as this is not a configurable output selection).
"": []
},
}
}
}
Differences to EVM
This section highlights some potentially observable differences in the YUL EVM dialect translation compared to Ethereum Solidity.
Solidity developers deploying dApps to pallet-revive ought to read and understand this section well.
Deploy code vs. runtime code
Our contract runtime does not differentiate between runtime code and deploy (constructor) code. Instead, both are emitted into a single PVM contract code blob and live on-chain. Therefore, in EVM terminology, the deploy code equals the runtime code.
Tip
In constructor code, the
codesizeinstruction will return the call data size instead of the actual code blob size.
Solidity
We are aware of the following differences in the translation of Solidity code.
The 63/64 gas rule
pallet-revive doesn’t apply the 63/64 gas rule. We strongly advice to change any code calling untrusted contracts to supply a limited amount of gas only!
address.creationCode
This returns the bytecode keccak256 hash instead.
YUL functions
The below list contains noteworthy differences in the translation of YUL functions.
Note
Many functions receive memory buffer offset pointer or size arguments. The PVM pointer size is 32 bit, supplying memory offset or buffer size values above
2^32-1may lead toOutOfGaserrors trap contract execution.
The solc compiler ought to always emit valid memory references, so Solidity dApp authors don’t need to worry about this unless they deal with low level assembly code.
mload, mstore, msize, mcopy (memory related functions)
In general, revive preserves the memory layout, meaning low level memory operations are supported. However, a few caveats apply:
- The EVM linear heap memory is emulated using a fixed byte buffer of 128kb. This implies that the maximum memory a contract can use is limited to 128kb (on Ethereum, contract memory is capped by gas and therefore varies).
- Thus, accessing memory offsets larger than the fixed buffer size will trap the contract at runtime with an
OutOfBounderror. - The compiler might detect and optimize unused memory reads and writes, leading to a different
msizecompared to what the EVM would see.
calldataload, calldatacopy
In the constructor code, the offset is ignored and this always returns 0.
Warning
pallet-reviverestricts the calldata size (to 128kb at the time of writing).
codecopy
Only supported in constructor code.
create, create2
Deployments on revive work different than on EVM. In a nutshell: Instead of supplying the deploy code concatenated with the constructor arguments (the EVM deploy model), the revive runtime expects two pointers:
- A buffer containing the code hash to deploy.
- The constructor arguments buffer.
To make contract instantiation using the new keyword in Solidity work seamlessly,
revive translates the dataoffset and datasize instructions so that they assume the contract hash instead of the contract code.
The hash is always of constant size.
Thus, revive is able to supply the expected code hash and constructor arguments pointer to the runtime.
Warning
This might fall apart in code creating contracts inside
assemblyblocks. We strongly discourage using thecreatefamily opcodes to manually craft deployments inassemblyblocks! Usually, the reason for usingassemblyblocks is to save gas, which is likely futile on revive due to the underlying differences in the VM architectures, gas models and transaction costs.
dataoffset
Returns the contract hash.
datasize
Returns the contract hash size (constant value of 32).
revert, return
pallet-revive restricts the returndata size (to 128kb at the time of writing).
prevrandao, difficulty
Translates to a constant value of 2500000000000000.
pc, extcodecopy
Only valid to use in EVM (they also have no use case in PVM) and produce a compile time error.
blobhash, blobbasefee
Related to the Ethereum rollup model and produce a compile time error. Polkadot offers a superior rollup model, removing the use case for blob data related opcodes.
Difference regarding the solc via-ir mode
There are two different compilation pipelines available in solc and there are small differences between them.
Since resolc processes the YUL IR, always assume the solc IR based codegen behavior for contracts compiled with the revive compiler.
Example: State variable initialization order in inheritance
With via-ir, base constructors run before derived state variables are initialized:
contract InnerContract {
uint public innerConstructedStartTokenId;
constructor() {
innerConstructedStartTokenId = _startTokenId();
}
function _startTokenId() internal view virtual returns (uint) {
return 0;
}
}
contract Test is InnerContract {
uint public START_TOKEN_ID = 1;
constructor() InnerContract() {
}
function _startTokenId() internal view virtual override returns (uint) {
return START_TOKEN_ID;
}
}
Here, innerConstructedStartTokenId in Test returns 0 (with legacy EVM codegen it’d return 1).
Rust contract libraries
Note
This is not yet implemented but something for consideration on the roadmap.
Solidity - tightly coupled to the EVM - introduces some inherent inefficiencies that are by design and either needs to be followed or can’t be easily worked around, even with efforts like better optimized compiler and VM implementations. This represents a technical dead end. So far the EVM sees no adoption beyond the blockchain industry. Chances are that the EVM end up deprecated for technical reasons (or maybe not and the RISC-V idea gets abandoned, who knows).
PVM, however, is a general purpose VM. It supports LLVM based mainstream programming languages like Rust. It’s a common software engineering practice to compose applications from pieces written in multiple languages, using each to their own strength. For example, AI solutions traditionally use the python scripting language for convenient developer experience, while the underlying AI models get implemented in a lower level language such as C++.
The same pattern can of course be applied to dApps, where we’d expect application specific languages like Solidity mixed with libraries implementing computationally complex algorithms in a lower level language. Business logic and user interfaces are naturally implemented as regular Solidity dApps which can include (link against) Rust libraries. Rust is a fast, safe low level language and the Polkadot SDK is written in Rust itself, making it an excellent choice.
For example, ZK proof verifiers or expensive DeFi primitives would benefit greatly from Rust implementations.
revive provides tooling support and a small Rust contracts SDK for seamless integration with Rust libraries.
revive-runner sandbox
Running contract code usually requires a blockchain node. While local dev nodes can be used, sometimes it’s just not desirable to do so. Instead, it can be much more convenient to run and debug contract code with a stripped down environment.
This is where the revive-runner comes in handy. In a nutshell, it is a single-binary no-blockchain pallet-revive runtime.
Installation and usage
Inside the root revive repository directory, install it from source (requires Rust installed):
make install-revive-runner
After installing, see revive-runner --help for usage help.
Trace logs
The standard RUST_LOG environment variable controls the log output from the contract execution. This includes revive runtime logs and PVM execution trace logs. Sometimes it’s convenient to have more fine granular insight. Some useful filters:
RUST_LOG=runtime=trace: Thepallet-reviveruntime trace logs.RUST_LOG=polkavm=trace: Low level PolkaVM instruction tracing.
Automatic contract instantiation
To avoid running the constract in an unitialized state, revive-runner automatically instantiates the contract before calling it (constructor arguments can be provided).
Example
Suppose we want to trace the syscalls of the execution of a compiled contract file Flipper.pvm:
RUST_LOG=runtime=trace revive-runner -f Flipper.pvm
[DEBUG runtime::revive] Contract memory usage: purgable=6144/3145728 KB baseline=103063/1572864
[TRACE runtime::revive::strace] call_data_size() = Ok(0) gas_consumed: Weight { ref_time: 985209, proof_size: 0 }
[TRACE runtime::revive::strace] value_transferred(out_ptr: 4294836096) = Ok(()) gas_consumed: Weight { ref_time: 2937634, proof_size: 0 }
[TRACE runtime::revive::strace] call_data_copy(out_ptr: 131216, out_len: 0, offset: 0) = Ok(()) gas_consumed: Weight { ref_time: 4084483, proof_size: 0 }
[TRACE runtime::revive::strace] seal_return(flags: 0, data_ptr: 131216, data_len: 0) = Err(TrapReason::Return(ReturnData { flags: 0, data: [] })) gas_consumed: Weight { ref_time: 5510615, proof_size: 0 }
[TRACE runtime::revive] frame finished with: Ok(ExecReturnValue { flags: (empty), data: [] })
[TRACE runtime::revive::strace] call_data_size() = Ok(0) gas_consumed: Weight { ref_time: 985209, proof_size: 0 }
[TRACE runtime::revive::strace] seal_return(flags: 1, data_ptr: 131088, data_len: 0) = Err(TrapReason::Return(ReturnData { flags: 1, data: [] })) gas_consumed: Weight { ref_time: 2456669, proof_size: 0 }
[TRACE runtime::revive] frame finished with: Ok(ExecReturnValue { flags: REVERT, data: [] })
Developer guide
This chapter covers internal aspects of the compiler and helps contributors getting started with the revive codebase.
Contributor guide
The revive compiler is an open source software project and we gladly accept quality contributions from anyone!
Getting started
A quick reference on how to build the Solidity compiler is maintained in the project’s README.md.
Using the Makefile
The Makefile comprehensively encapsulates all development aspects of this codebase. It is kept concise and readable. Please read and use it! You’ll learn for example:
- How to build and install a
resolcdevelopment version. - How to run tests and benchmarks.
- How to cross-compile
resolc.
As a general rule-of-thumb: If make test runs fine locally, chances for green CI pipelines are good.
Codebase organization
For the most parts, revive is a rather standard Rust workspace codebase. There are some non-Rust dependencies, which sometimes complicates things a little bit.
The crates/ dir
All Rust crates live under the crates/ directory. The workspace automatically considers any crate found therein. If you need to add a new create, please implement it there.
Compiler library crates should be named with the revive- prefix. The crate location doesn’t need the prefix.
Dependencies
Dependencies should be added as workspace dependencies. Try to avoid pinning dependencies whenever possible. If possible, add dev dependencies as dev-dependencies only.
Please do always include the Cargo.lock dependency lock file with your PR. Please don’t run cargo update together with other changes (it is preferred to update the lock file in a dedicated dependency update PR).
Contribution rules
- Changes must be submitted via a pull request (PR) to the github upstream repository.
- Ensure that your branch passes
make testlocally when submitting a pull request. - A PR must not be merged until CI fully passes. Exceptions can be made (for example to fix CI issues itself).
- No force pushes to the
mainbranch and open PR branches. - Maintainers can request changes or deny contributions at their own discretion.
Style guide
We require the official Rust formatter and clippy linter. In addition to that, please also consider the following best-effort aspects:
- Avoid magic numbers and strings. Instead, add them as module constants.
- Avoid abbreviated variable and function names. Always provide meaningful and readable symbols.
- Don’t write macros and don’t use third party macros for things that can easily be expressed in few lines of code or outlined into functions.
- Avoid import aliasing. Please use the parent or fully qualified path for conflicting symbols.
- Any inline comments must provide additional semantic meaning, explain counter-intuitive behavior or highlight non-obvious design decisions. In other words, try to make the code expressive enough to a degree it doesn’t need comments expressing the same thing again in the English language. Delete such comments if your AI assistant generated them.
- Public items must have a meaningful doc comment.
- Provide meaningful panic messages to
.expect()or just use.unwrap().
AI policy
Contributors may use whatever AI assistance tools they wish to whatever degree they wish in the process of creating their contribution, given they acknowledge the following:
Project maintainers may reject any contribution (or portions of it) if the contribution shows signs of problematic involvement of generative AI.
Judgement of “problematic involvement” lies at the sole discretion of project maintainers. No proof (whether a contribution was in fact AI generated or not) is required. Rationale:
- No one enjoys reading soulless and uncanny LLM slop. Please review and fix any AI slop yourself prior to submitting a PR.
- A Solidity compiler is security sensitive software. Even miniscule mistakes can ultimately lead to loss of funds. AI models are inherently stochastic. They regurarly fail to capture important nuances or produce straight hallucinations. Code that was “blindly” generated has no home here.
reviveis a large codebase. Generative AI assistants may not have enough “context window” to sufficiently capture correctness, consistency and style aspects of the codebase. We’d like to keep this codebase maintainable by humans for the forseeable future.
Compiler architecture and internals
revive relies on solc, the Ethereum Solidity compiler, as the Solidity frontend to process smart contracts written in Solidity. LLVM, a popular and powerful compiler framework, is used as the compiler backend and does the heavy lifting in terms of optimizitations and RISC-V code generation.
revive mainly takes care of lowering the Yul intermediate representation (IR) produced by solc to LLVM IR. This approach provides a good balance between maintaining a high level of Ethereum compatibility, good contract performance and feasible engineering efforts.
resolc
resolc is the overarching compiler driver library and binary.
When compiling a Solidity source file with resolc, the following steps happen under the hood:
solcis used to lower the Solidity source code into YUL intermediate representation.revivelowers the YUL IR into LLVM IR.- LLVM optimizes the code and emits a RISC-V ELF shared object (through LLD).
- The PolkaVM linker finally links the ELF shared object into a PolkaVM blob.
This compilation process can be visualized as follows:
Reproducible contract builds
Because on-chain contract code is identified via its code blob hash, it is crucial to maintain reproducible contract builds. A given compiler version must reproduce the contract build exactly on every target platform resolc supports via the official binary releases.
To ensure this, we employ the following measures:
- The code generation must be fully deterministic. For example iterating over standard
HashMapinvalidates this due to its internal state, making it an invalid operation inrevive. To circumvent that, aBTreeMapcan be used instead. - We release fully statically linked
resolcbinaries. This prevents dynamic linking of potentially differentiating libraries. - The only non-bundled dependency is the
solccompiler. This is considered fine because the same properties apply tosolc.
The revive compiler libraries
The main compiler logic is implemented in the revive-yul and revive-llvm-context crates.
The Yul library implements a lexer and parser and lowers the resulting tree into LLVM IR. It does so by emitting LL using the LLVM builder and our own revive-llvm-context compiler context crate. The revive LLVM context crate encapsulates code generation logic (decoupled from the parser).
The Yul library also implements a simple visitor interface (see visitor.rs). If you want to work with the AST, it is strongly recommended to implement visitors. The LLVM code generation is implemented using a dedicated trait for historical reasons only.
EVM heap memory
PVM doesn’t offer a similar API. Hence the emitted contract code emulates the linear EVM heap memory using a static byte buffer. Data inside this byte buffer is kept big endian for EVM compatibility reasons (unaligned access is allowed and makes optimizing this non-trivial).
Unlike with the EVM, where heap memory usage is gas metered, our heap size is static (the size is user controllable via a setting flag). The compiler emits bound checks to prevent overflows.
The LLVM dependency
LLVM is a special non Rust dependency. We interface its builder interface via the inkwell wrapper crate.
We use upstream LLVM, but release and use our custom builds. We require the compiler builtins specifically built for the PVM rv64emacb target and always leave assertions on. Furthermore, we need cross builds because resolc itself targets emscripten and musl. The revive-llvm-builer functions as a cross-platform build script and is used to build and release the LLVM dependency.
We also maintain the lld-sys crate for interfacing with LLD. The LLVM linker is used during the compilation process, but we don’t want to distribute another binary.
Custom optimizations
An experimental newyork optimizer introduces a custom IR layer between Yul and LLVM IR to capture optimization opportunities that neither solc nor LLVM can realize on their own. solc optimizes for EVM gas on a 256-bit big-endian stack machine, while LLVM lacks the domain knowledge to understand EVM memory semantics or Solidity patterns. The newyork IR bridges this gap with passes for type narrowing, memory optimization, function deduplication, and more.
The newyork optimizer
The newyork crate (crates/newyork/) introduces an additional intermediate representation (IR) layer between Yul and LLVM IR. It enables domain-specific optimizations that neither solc nor LLVM can perform on their own, because they lack semantic knowledge about the cross-domain compilation from EVM to PolkaVM.
Note
The newyork optimizer is experimental. It is gated behind the
--newyorkCLI flag or thesettings.polkavm.newyorkfield in standard JSON input, and not yet enabled by default.
Motivation
The EVM and PolkaVM are fundamentally different machines:
| Property | EVM | PolkaVM (RISC-V) |
|---|---|---|
| Word size | 256-bit | 64-bit |
| Endianness | Big-endian | Little-endian |
| Architecture | Stack machine | Register-based |
| Memory model | Linear with free pointer convention | Flat address space |
solc optimizes Yul IR for EVM gas costs on a 256-bit big-endian stack machine. LLVM, on the other hand, operates at too low a level to understand EVM memory semantics or Solidity patterns. By the time Yul reaches LLVM IR, the high-level intent is lost.
The newyork IR sits between these two worlds and recovers enough semantic information to make optimization decisions that neither compiler can make alone.
Pipeline overview
┌──────────────────────────────────────────────────┐
Yul AST ──────────► │ newyork IR │ ──► LLVM IR ──► RISC-V
from_yul│ │ to_llvm
│ 1. inline │
│ 2. simplify (pass 1) │
│ 3. dedup (exact + fuzzy) │
│ 4. mem_opt + fmp_prop + keccak_fold │
│ 5. simplify (pass 2) │
│ 6. mapping_access_outlining + guard_narrow │
│ 7. simplify (pass 3) │
│ 8. dedup (exact + fuzzy, pass 2) │
│ ── recursive on subobjects ── │
│ 9. type_inference (iterative narrowing) │
│ 10. late inline loop: inline, simplify, outline, │
│ guard-narrow, simplify, dedup, narrow │
│ 11. heap_opt (analysis) │
│ 12. validate │
└──────────────────────────────────────────────────┘
The optimizer runs the following passes in order:
- Inlining – custom heuristics tuned for PolkaVM call overhead, with Tarjan SCC-based recursion detection and quadratic leave-overhead modeling.
- Simplify (pass 1) – constant folding, algebraic identities, strength reduction (
mulby power-of-2 toshl), copy propagation, dead code elimination, environment read CSE (callvalue,caller,origin, etc.), and revert pattern outlining (panic selectors, custom error selectors). - Function deduplication – exact structural match, then fuzzy dedup (functions differing only in literal constants are parameterized and merged, up to 4 differing positions).
- Memory optimization – load-after-store elimination, keccak256 fusion (
mstore+keccak256sequences intoKeccak256Single/Keccak256Pairnodes), free memory pointer propagation (replacesmload(0x40)with a known constant), and constant keccak256 folding (precomputes hashes of compile-time-constant inputs). - Simplify (pass 2) – cleans up dead code and new constant expressions exposed by memory optimization and keccak folding.
- Compound outlining – detects
keccak256_pair+sload/sstoresequences and fuses them intoMappingSLoad/MappingSStoreIR nodes, eliminating intermediate hash values. Guard narrowing – detectsif gt(val, MASK) { revert }andiszero(eq(val, and(val, MASK)))patterns and inserts AND-mask narrowing, giving type inference proof that values fit in fewer bits. - Simplify (pass 3) – propagates opportunities created by compound outlining and guard narrowing.
- Function deduplication (pass 2) – catches new duplicates exposed by guard narrowing and compound outlining canonicalization.
- Type inference – narrows 256-bit values to smaller widths (
I1,I8,I32,I64,I128,I160) where provable. Runs iteratively for up to 4 cascading refinement rounds, combining forward min-width propagation, backward use-context demands, transparent-operation demand propagation, and interprocedural parameter/return narrowing. - Late inline loop – now that narrowing and simplification have shrunk wrapper functions below the inline thresholds, re-runs inlining, simplification, mapping access outlining, guard narrowing, deduplication, and type inference to collect the residual opportunities.
- Heap analysis – analyzes memory access patterns (alignment, static offsets, taintedness, escaping regions) to determine which accesses can use native little-endian layout, skipping byte-swap operations. Uses GCD-based alignment propagation and per-region taint tracking.
- Validation – checks SSA well-formedness (use-before-def, multiple definitions), yield count consistency, and function reference correctness.
Steps 1-8 run recursively on subobjects (deployed contract code), where optimization impact is greatest. Steps 9-12 run on the full object tree.
IR design
The newyork IR is an SSA form with structured control flow, inspired by MLIR’s SCF dialect. Key design choices:
- Explicit types with address spaces: Every value carries a bit-width (
I1,I8,I32,I64,I128,I160,I256) and pointers carry address space information (Heap,Stack,Storage,Code). All values start asI256and are narrowed by type inference. - Pure expressions vs. effectful statements: Expressions compute values without side effects; statements perform memory, storage, or control flow effects. This separation simplifies analysis and rewriting.
- Semantic annotations: Memory operations are tagged with region information (
Scratch,FreePointerSlot,Dynamic). Storage operations carry static slot values when known at compile time. - Structured control flow:
If,Switch, andFornodes preserve the high-level structure from Yul, with explicit region arguments and yields for value flow across control edges.
For per-operation detail — printed syntax, operand and result types, and more — see the newyork IR reference.
Key optimizations explained
Type narrowing
EVM operates on 256-bit words, but most values in practice fit in 32 or 64 bits. The type inference pass performs bidirectional analysis:
- Forward: computes minimum width from literal values and operation semantics (e.g.,
add(I64, I8)producesI65, rounded up toI128). - Backward use tracking: classifies each value’s uses into 9 context categories (
MemoryOffset,MemoryValue,StorageAccess,Comparison,Arithmetic,FunctionArg,FunctionReturn,ExternalCall,General). All categories conservatively demand the fullI256width by default; the categorization is what enables the interprocedural phase to selectively relax the demand for narrowed function arguments. Earlier versions narrowed directly from the use category, but that was unsound for memory offsets —mload(2^128)aliased tomload(0)because the bounds check ran on an already-truncated value (commitccca38df). - Transparent demand propagation: for modular-arithmetic operations (
Add,Sub,Mul,And,Or,Xor), propagates narrow demands backward through operands, exploiting the property thattrunc(op(a,b), N) == op(trunc(a,N), trunc(b,N)). - Interprocedural: iteratively narrows function parameter and return types in up to four rounds, combining four narrowing strategies — body-driven parameter narrowing, caller-driven parameter narrowing, forward-based return narrowing, and demand-based return narrowing — and re-running full inference between rounds. Parameters are clamped to at least
I32(XLEN on PolkaVM).
This allows LLVM to emit native 32/64-bit instructions instead of software-emulated 256-bit arithmetic, and eliminates expensive multi-instruction comparison sequences (16-20 RISC-V instructions for i256 comparisons reduced to 1-2 for i64).
Guard narrowing
Solidity emits runtime guards that prove values fit in narrow ranges (e.g., address validation via if gt(val, 2^160-1) { revert }). The guard narrowing pass detects these patterns and inserts explicit AND-mask narrowing after the guard. This gives downstream type inference proof that the value fits in fewer bits, enabling cascading narrowing of comparisons, arithmetic, and memory operations that use the guarded value.
Two pattern families are recognized:
- GT-based guards:
if gt(val, MASK) { <terminates> }where MASK is a boundary value like2^N - 1 - EQ-based guards:
iszero(eq(val, and(val, MASK)))patterns common in Solidity’s address validation
Heap optimization
PVM doesn’t provide EVM-compatible linear memory, so the compiler emulates it using a byte buffer with byte-swap operations for big-endian compatibility. The heap analysis pass determines which memory accesses can use native little-endian layout by analyzing access patterns:
- Tracks alignment and static offset information for all memory accesses using GCD-based propagation
- Propagates taintedness when addresses escape to external calls, are written by external sources (
codecopy,calldatacopy), or use unaligned access patterns - Tracks variable-accessed offsets to prevent mode mismatches between native and byte-swap accesses to the same location
- Handles loop-carried variables conservatively (marked as non-literal to prevent false constant propagation)
The codegen backend supports four memory access modes: AllNative (all accesses skip byte-swap), InlineNative (constant-offset accesses use native layout), InlineByteSwap (constant-offset accesses use inline byte-swap), and ByteSwap (standard byte-swap through helper functions).
Free memory pointer range proof
The Solidity free memory pointer (mload(0x40)) always fits in 32 bits — sbrk enforces FMP < heap_size on every store, regardless of which memory mode the contract uses. After every literal mload(0x40), codegen emits a trunc N → zext 256 chain (where N is bits(heap_size - 1), e.g. 17 for the 131,072-byte default heap). The trunc-extend round-trip is a no-op semantically, but exposes the bound to LLVM’s IPSCCP range analysis, which then propagates it through every add(fmp, K) and eliminates the trailing safe_truncate_int_to_xlen overflow checks at every FMP-derived offset use. Despite only affecting a single codegen site, this is the single largest contributor to the optimizer’s code-size reduction.
A subtle gating issue: the byte-order mode (InlineNative / ByteSwap) and the value bound on FMP are independent invariants. fmp_native_safe() and can_use_native(0x40) protect against mixing little-endian writers with big-endian readers on the FMP slot, which would corrupt the stored offset; the value bound is unrelated and holds in every mode. Earlier versions of the codegen gated the load-side range proof on the byte-order checks, which suppressed the optimization for any contract with dynamic memory accesses. Decoupling the two reasonings — keeping the byte-order gate on the store side, dropping it from the load-side range proof — is what makes the multiplicative IPSCCP effect available to OZ-class contracts.
Soundness traps for FMP optimizations
The FMP slot is small but easy to mis-optimize. The codebase carries several regression tests for previously-found soundness bugs; new FMP-related changes should be verified against them:
mload_at_fmp_slot(crates/integration/src/tests.rs, fixed in1fd6063c): testsmload(0x40)and offsets near it (0x21,0x3f,0x42) on a contract that also performs dynamic mloads. Catches byte-order mismatches when one access goes native (LE) and another goes byte-swap (BE). The fix blocks native mode for FMP wheneverhas_dynamic_accessesis true.mload_huge_offset_traps(fixed inccca38df): tests thatmload(2^128)andmload(2^255)correctly trap via the gas-exhaustion path. CatchesUseContext::MemoryOffsetnarrowing bypassing thesafe_truncate_int_to_xlenoverflow check at the use site —mload(2^128)aliasing tomload(0)and returning the zero-initialized scratch slot. The fix classifiesMemoryOffsetasI256so it doesn’t drive narrowing; the bounds check at the use site catches out-of-range.- FMP i32 shortcut removal (
dbcfc921): an earlier optimization stored only 4 bytes at offset 0x40 instead of the full 32-byte EVM word, breaking any inline assembly usingmstore(0x40, ...)for non-FMP purposes. Caused a cascade of 249/251 retester failures via allocator corruption. No dedicated regression test was added — the retester corpus was sufficient coverage — but the lesson generalizes: writes to 0x40 must store the full word, even when the high bits are provably zero, because the slot is part of the same 32-byte memory region read by other code.
When adding an optimization that touches FMP, distinguish carefully between:
the byte-order encoding at the slot (must be consistent between writers
and readers), the value bound (FMP < heap_size, always true), and the
stored width (must be 32 bytes for mstore(0x40, ...), even though only
the low N bits are non-zero).
Known limitation: dynamic full-word stores to the FMP word
The fmp_could_be_unbounded analysis flags a static mstore(0x40, untrusted) and any
dynamic-offset mstore8, but not a dynamic-offset full-word mstore. Such a store whose
i256 offset wraps (mod 2²⁵⁶) to the FMP word [0x40, 0x5f] overwrites the free pointer with an
arbitrary value, which the load-side range proof would then truncate — a miscompile.
This is a deliberate, documented gap rather than a bug fix because there is no cheap sound
discriminator. A store hits the FMP word iff its offset lands in [0x40, 0x5f], which is
in-bounds — safe_truncate_int_to_xlen only traps offsets ≥ heap_size — and 256-bit wrap lets
any computed add(base, k) reach 0x40 with an adversarial operand, so the offset cannot be
proven to miss the slot from width/range information. The only sound recognizer (treat
add(fmp, small_const) as ≥ 0x80 by induction on FMP-boundedness) needs new FMP-derivation
dataflow, is fragile, and still misses dynamic-index array stores. Conservatively flagging every
dynamic full-word store (as the rare dynamic mstore8 does, where it is free) disables the FMP
range proof for essentially every contract — measured at roughly +9% / +30 KB on the
OpenZeppelin corpus.
The gap is unreachable from solc output: solc’s dynamic memory stores are all free-pointer-relative
(≥ 0x80) and never target 0x40. Only hand-written Yul (resolc --yul) with an offset
engineered to equal 0x40 reaches it.
Keccak256 fusion and folding
Two complementary optimizations target the common Solidity pattern of hashing values for storage slot computation:
- Fusion: Recognizes
mstore+keccak256sequences and fuses them into dedicated IR nodes (Keccak256Single,Keccak256Pair), eliminating intermediate memory traffic. - Constant folding: When all keccak256 inputs are compile-time constants, the hash is computed at compile time and replaced with a literal.
Known limitation: constant-folding drops the fused-keccak scratch write-back
The fused Keccak256Pair/Keccak256Single helpers write their inputs back to scratch memory
([0, 0x40) / [0, 0x20)), and fusion dead-eliminates the original mstores because that
write-back reproduces them. Constant-folding the fused node to a literal removes the helper, so the
scratch is left unwritten — a later mload from [0, 0x40) that the optimizer cannot forward
(across a region/call boundary) would read stale memory.
This gap is deliberately left open: it is solc-unreachable (solc treats scratch as volatile and never re-reads it as data after a keccak), and every sound fix is a code-size regression because the dropped write-back means the current output is already short the stores (disabling the fold falls back to the runtime keccak helper, +0.78% on the OZ corpus, with no later mem_opt pass to clean re-emitted writes). Only hand-written Yul that reads scratch after a constant-operand keccak across a boundary can observe it.
Compound outlining (mapping access)
Solidity mapping accesses follow a predictable pattern: hash a key with a storage slot, then load/store the result. The compound outlining pass detects keccak256_pair(key, slot) followed by sload/sstore and fuses them into MappingSLoad/MappingSStore IR nodes. These are lowered to outlined helper functions (__revive_mapping_sload, __revive_mapping_sstore) that combine the hash computation with the storage operation, eliminating intermediate values and redundant byte-swaps.
Fuzzy function deduplication
Solidity generates many near-identical functions that differ only in literal constants (e.g., error selectors, storage slot offsets). Fuzzy deduplication identifies such groups, parameterizes the differing literals (up to 4 positions), and replaces all copies with calls to a single shared implementation.
Revert pattern outlining
The simplify pass detects common revert patterns and replaces them with compact IR nodes:
- Panic reverts: Solidity
Panic(uint256)sequences (selector0x4e487b71+ encoded panic code) are collapsed intoPanicRevert { code }nodes, which are lowered to shared helper functions. - Custom error reverts: ABI-encoded custom error reverts with known selectors are collapsed into
CustomErrorRevert { selector, args }nodes.
These patterns appear dozens of times in typical contracts, and outlining them into shared blocks eliminates significant code duplication.
Outlined helper functions
The LLVM codegen backend generates approximately 15 types of outlined helper functions for common operations:
- Storage:
__revive_sload_word,__revive_sstore_word(handle byte-swap internally) - Mapping:
__revive_mapping_sload,__revive_mapping_sstore(keccak256 + storage in one call) - Callvalue:
__revive_callvalue,__revive_callvalue_nonzero(boolean optimization for non-payable checks) - Calldataload:
__revive_calldataload(outlined when >= 20 call sites) - Memory:
__revive_store_bswap,__revive_exit_checked,__revive_return_word - Errors:
__revive_error_string_revert_N,__revive_custom_error_N(per data-word count) - Keccak wrappers:
__keccak256_slot_N(onenoinlinewrapper per constant slot, internally dispatching to__revive_keccak256_two_words)
Additionally, common exit patterns (revert with constant length, zero-value returns) are deduplicated into shared LLVM basic blocks, saving hundreds of instruction copies in large contracts.
Codesize results
Integration test contracts
Reproducible with cargo test --package revive-integration -- codesize for the Via Yul IR column (crates/integration/codesize.json) and cargo test --package revive-integration --features newyork -- codesize for the Via newyork IR column (crates/integration/codesize_newyork.json).
| Contract | Via Yul IR (bytes) | Via newyork IR (bytes) | Reduction |
|---|---|---|---|
| Baseline | 838 | 493 | −41.2% |
| Computation | 2,368 | 1,217 | −48.6% |
| DivisionArithmetics | 11,444 | 7,370 | −35.6% |
| ERC20 | 18,057 | 8,726 | −51.7% |
| Events | 1,614 | 909 | −43.7% |
| FibonacciIterative | 1,373 | 969 | −29.4% |
| Flipper | 2,205 | 1,058 | −52.0% |
| SHA1 | 7,830 | 6,264 | −20.0% |
OpenZeppelin contracts
Measured against real-world contracts generated with the OpenZeppelin Wizard. The numbers below are a development snapshot.
| Contract | Via newyork IR (bytes) |
|---|---|
| oz_gov | 81,840 |
| erc721 | 52,634 |
| erc20 | 45,703 |
| oz_stable | 45,052 |
| oz_rwa | 41,581 |
| erc1155 | 33,087 |
| oz_simple_erc20 | 17,024 |
| proxy | 3,748 |
| Total | 320,669 |
For comparison, building the same contracts without the newyork optimizer at the equivalent snapshot produced 563,526 bytes total — a reduction of about −43% across the corpus.
Per-contract reductions in the integration suite range from roughly −20% (SHA1, where the bulk of the work is the SHA-1 inner loop and offers little to optimize) to about −52% (Flipper, where the optimizer strips away most of Solidity’s dispatch and storage-access scaffolding).
Development history and challenges
The first version of the newyork optimizer was authored collaboratively and reviewed extensively by the revive maintainers, as well as Claude Opus, Claude Fable, Qwen, Minimax and Deepseek LLMs, over a span of many months — from early February 2026 through mid-June 2026.
The development progressed through several distinct phases:
Phase 1 – Initial scaffolding: The first draft established the core IR data structures, Yul-to-newyork-IR translation, and LLVM codegen. Early commits focused on getting a correct round-trip through the new pipeline.
Phase 2 – Optimization passes: Once the baseline was stable, optimization passes were added iteratively: inlining, simplification, memory optimization, function deduplication, keccak256 fusion, and type inference. Each pass was validated against differential tests comparing EVM and PVM execution.
Phase 3 – Soundness hardening: Several type inference and narrowing approaches turned out to be unsound and had to be reworked:
- An early type inference approach caused namespace collisions across subobjects and was scoped per-object.
- Caller-based parameter narrowing was polluted by overly aggressive inference and replaced with body-based structural analysis.
- Backward demand-driven narrowing required multiple iterations to become provably safe.
Phase 4 – Measuring and tuning: Systematic measurement of OpenZeppelin contracts revealed which optimizations had the most impact and which approaches regressed performance.
Throughout development the optimizer was validated against the existing integration and differential test suites (containing over 30,000 test cases), which run each contract on both EVM and PVM and assert identical state changes.
The newyork compiler pipeline introduced no new regressions over these test suites. This was achieved by careful manual reviews and many LLM bughunt loops. Additionally, a final security review by Anthropic’s Fable 5 LLM found no remaining soundness issues. As with any new compiler feature, it should still be treated as experimental as of now.
Approaches that did not work
| Approach | Outcome |
|---|---|
| Storage bswap decomposition (4x bswap.i64) | Regressed: LLVM handles bswap.i256 better natively |
NoInline on __revive_int_truncate | +62% regression: PolkaVM call overhead exceeds inline cost |
| Native FMP memory (inline sbrk) | Mixed: small contracts improved, large ones regressed from sbrk bloat |
| Shared overflow trap block | Mixed: prevented LLVM from eliminating individual dead overflow checks |
| Aggressive IR-level single-call inlining | Regressed large contracts (ERC20 +6.1%): merged bodies become monolithic functions LLVM can’t optimize, so large functions are deferred to LLVM’s inliner instead |
Type-inference narrowing of mload(0x40) to I32 | Regressed small contracts (+252 bytes): conflicts with the codegen FMP range proof; the bound is exposed via a trunc→zext pair instead |
Full simplifier re-run after mem_opt | Mixed: helped small ERC20 (−293 bytes) but regressed the OZ stablecoin (+72 bytes); replaced by a targeted keccak-only fold |
These results highlight a recurring theme: interacting well with LLVM’s own optimization passes is critical. Optimizations at the IR level can inadvertently inhibit LLVM’s downstream passes, sometimes causing surprising regressions.
Known limitations and future work
The following opportunities have been identified but are not yet implemented:
- Memory optimization across loop boundaries: Tracked memory state is cleared around
forloop condition, body, and post blocks, so load-after-store eliminations do not carry across loop iterations. Preserving loop-invariant state would recover more eliminations. - Adaptive inlining thresholds: Current thresholds are static constants. Profile-guided or contract-size-aware heuristics could improve decisions for diverse contract sizes.
- Extended fuzzy deduplication: The current pass only parameterizes literals in
Letbindings andSStoreslots. Extending it to consider literals insideMStore,Return,Revert, andLogstatements would find more deduplication opportunities. - Type checking in validation: The validator checks SSA well-formedness and structure, but not operation type consistency. Type discipline is maintained by construction (type inference and codegen), with LLVM’s IR verifier as the backstop.
- Loop variable narrowing: Loop-carried variables are conservatively widened to
I256. Reaching a fixed-point across loop iterations could allow narrower types for simple counters. - Functions with
leaveinside aforloop are not inlined: the IR-level inliner defers such functions to LLVM’s inliner, so they miss the interprocedural constant propagation and width narrowing the IR-level pass provides.
Debug output
Passing --debug-output-dir <path> makes the newyork pipeline write IR and analysis artifacts for each compiled contract into that directory. The dumps are produced automatically whenever the directory is set.
| File | Content |
|---|---|
<file-stem>.newyork | Final optimized IR, annotated with the inferred type widths |
<file-stem>.snapshot.newyork | IR snapshot taken before the late passes (only when captured during translation) |
<file-stem>.heap.newyork | Heap analysis summary (native regions/offsets, taintedness, escapes, dynamic accesses) |
<file-stem>.mem.newyork | Memory optimization counters (loads/stores eliminated, keccak fusions, FMP loads eliminated) |
Module reference
| Module | Purpose |
|---|---|
lib.rs | Pipeline orchestration and pass sequencing |
ir.rs | Core IR data structures (types, expressions, statements, functions, objects) |
from_yul.rs | Yul AST to newyork IR translation (two-pass with forward reference support) |
to_llvm.rs | newyork IR to LLVM IR codegen with outlined helpers and narrowing |
simplify.rs | Constant folding, algebraic identities, strength reduction, copy propagation, DCE, environment read CSE, revert outlining, callvalue hoisting, function deduplication (exact and fuzzy), constant keccak folding |
inline.rs | Function inlining with PolkaVM-tuned heuristics (Tarjan SCC, leave elimination) |
type_inference.rs | Bidirectional integer width narrowing with transparent demand propagation |
mem_opt.rs | Load-after-store elimination, keccak256 fusion, FMP propagation |
heap_opt.rs | Heap access pattern analysis, alignment tracking, byte-swap elimination |
mapping_access_outlining.rs | Mapping access pattern detection and fusion (keccak256_pair + sload/sstore) |
guard_narrow.rs | Guard pattern detection and AND-mask narrowing insertion |
validate.rs | IR well-formedness checks (SSA, yields, function references) |
printer.rs | Human-readable IR pretty printer with configurable output |
ssa.rs | SSA construction helpers (scope management, phi-node merging) |
newyork IR reference
A per-operation reference for the newyork IR: textual syntax, operand and result types, purity, region and static-slot annotations, and examples.
How to read this reference
This reference page enumerates every operation the newyork IR supports. It is a lookup, not a walkthrough: each entry is self-contained and intended to be reachable by anchor.
Operations are grouped by function (memory and storage writes, pure expressions, control flow, and so on) rather than alphabetically. Jump to a specific operation from the operation index below, or use the sidebar.
Every operation appears in two places in the codebase. The canonical Rust definition is a variant of either Expression or Statement in ir.rs. The textual rendering used by debug dumps and by this reference page is produced by the printer in printer.rs.
Note
Treat the printed syntax as a debug surface, not a stable input language: there is no parser for it, and printer details change when passes add new annotations.
Entry format
Each operation entry has the same shape:
| Field | What it shows |
|---|---|
| Heading | The printed operation name (e.g. mstore) followed by the Expression or Statement variant it corresponds to in ir.rs. |
| Description | A short prose summary of what the operation does and any semantic notes worth knowing before reading the rest of the entry. |
| Syntax | The literal printer output, including any optional debug annotations (region tags, static-slot comments). Anything inside /* ... */ is a debug-only annotation and is not part of the operation itself. |
| Example | A minimal printed snippet, using the printer’s actual v0/v1/… naming. |
| Operands | One row per input or structural participant in the printed syntax. Value operands list the narrowest type the operation guarantees (default i256; narrower widths only appear when type inference has narrowed an upstream definition). Vector-of-operands fields show Vec<…> as the type. Non-value participants such as nested regions are listed with an em-dash type to mark them as structural rather than as operands. |
| Result and purity | The type the operation produces (or none for statements that bind no value), followed by a purity label, either Pure or Effectful. Pure operations may be reordered, deduplicated, or eliminated by the simplifier; effectful ones may not. Effectful entries may carry a parenthetical describing the nature of the side effect when informative (e.g. “control flow”, “terminator”, or a note about revert/trap behavior). |
| Annotations | Operation-specific fields the printer surfaces as /* ... */ comments in the dump (region tag for memory ops, static-slot hint for storage ops, type suffix for non-default widths). Listed here as a table of source field → printed form. |
Syntax notation
Syntax templates in each entry use the following conventions:
| Notation | Meaning |
|---|---|
add, mload, if, else, case, let, yield, … | Literal printer tokens: bare lowercase identifiers and keywords that the printer emits verbatim. |
$offset, $value, $key, $lhs, $rhs, … | Role names ($-prefixed): placeholders for SSA value references the printer renders as v followed by a decimal id (v0, v1, …). |
<type>, <region>, <hex>, <id>, <bits>, <func_name>, <N>, <length>, … | Metavariables: stand for compile-time fields (type tags, hex values, identifier strings, integer counts), not SSA values. The concrete values they take are enumerated in the Annotations section of each entry or in the type system reference. |
[…] | Optional parts. Anything inside the brackets may or may not appear in any given dump, depending on the conditions described in the operation’s Annotations section. |
[: <type>] | Optional type suffix on a value reference. Suppressed when the value’s type is the default i256 integer; present otherwise (: i32, : ptr<heap>, …). |
/* … */ | Debug-only annotations the printer attaches to certain operations (memory region tag, static-slot hint, etc.). |
… | Repetition: “more entries of the same shape.” Used in vector operand lists ($arg_0, $arg_1, …) and in multi-line block bodies ({ … }). |
For instance, this template:
let $result[: <type>] := and($lhs[: <type>], $rhs[: <type>])
prints as:
let v2: i8 := and(v0, v1: i8)
$result rendered as v2 with an i8 type suffix, $lhs as v0 at the default i256 (type suffix omitted), and $rhs as v1 with an i8 type suffix.
Note
A value’s printed width is use-driven. Type inference assigns each value a forward width from its definition, then widens it to satisfy its uses. The type suffix shown for a value in an example (such as
i8) is therefore only illustrative — a short example may not show the uses that determine it, and the same operation can appear with a wider suffix, or none (it is omitted for the defaulti256), in another program.For instance, a value used as a memory offset widens to
i64; as an address (acalltarget,extcodesize) toi160; stored as a full word (anmstore/sstorevalue) toi256; and anadd/muloperand up to thei64register width.
Operation index
Pure expressions
Constants and variables
Arithmetic
addsubmuldivsdivmodsmodexpandorxorshlshrsarltgtsltsgteqbytesignextendaddmodmulmodiszeronotclz
Bit-width conversions
Hashing
Environment reads
callercallvalueoriginaddresschainidgasmsizecoinbasetimestampnumberdifficultygaslimitbasefeeblobbasefeeblobhashblockhashselfbalancegasprice
Calldata, returndata, and code
Memory and storage loads
Linker
Function call
Memory and storage writes
Bulk copies
Bindings and wrappers
Structured control flow
External interaction
Termination
Type system
Every value in the IR carries a Type. The operation entries below refer to widths (i1…i256), address spaces (ptr<heap>, etc.), and memory regions (scratch, etc.) by their printed form; this section is the reference for those names.
Type
The umbrella enum, with these variants:
| Variant | Printed as | Description |
|---|---|---|
Int(BitWidth) | i1, i8, …, i256 | An integer at one of the BitWidth widths. |
Ptr(AddressSpace) | ptr<heap>, ptr<stack>, ptr<storage>, ptr<code> | A pointer tagged with its address space; see AddressSpace. |
Void | void | Unit type. Used for statements that produce no value and for void-returning functions. |
BitWidth
The rungs of integer width. Newly minted values default to I256; type inference narrows them down to one of the lower rungs when it can prove the upper bits are zero or unused.
| Variant | Printed as | Typical use |
|---|---|---|
I1 | i1 | Boolean. Result type of every comparison and iszero. |
I8 | i8 | Byte values. The narrowest meaningful integer. |
I32 | i32 | PolkaVM pointer width (XLEN); minimum width for function parameters under the rv64e ABI. |
I64 | i64 | PolkaVM native register width; most narrowed values land here. |
I128 | i128 | Two registers; arithmetic that overflows i64 but doesn’t need full 256-bit emulation. |
I160 | i160 | Ethereum addresses; result of caller, origin, mapping keys. |
I256 | i256 | EVM word width. The default and conservative ceiling. |
AddressSpace
The address space a pointer points into. Carried on every Ptr value so the codegen can lower loads and stores without a separate alias-analysis pass.
| Variant | Printed as | Points into | Endianness |
|---|---|---|---|
Heap | ptr<heap> | Emulated EVM linear memory (the simulated mload/mstore region). | Big-endian (by EVM contract). |
Stack | ptr<stack> | Native PolkaVM stack allocations. | Little-endian (no swap). |
Storage | ptr<storage> | Contract storage; key/value with 256-bit slots. | Big-endian on the wire. |
Code | ptr<code> | Read-only code/data segment. | Big-endian. |
MemoryRegion
A refinement carried by every memory load and store on top of AddressSpace::Heap. The tag tells later passes what kind of heap address an offset is hitting, which drives both free-memory-pointer propagation and byte-swap elimination.
| Variant | Address range | Printed as | Meaning |
|---|---|---|---|
Scratch | 0x00–0x3f | /* scratch */ | EVM scratch space; safe to touch without consulting the free memory pointer. |
FreePointerSlot | exactly 0x40 | /* free_ptr */ | Slot that stores the free memory pointer itself. |
Dynamic | 0x80 and above | /* dynamic */ | Real heap allocations. |
Unknown | everything else (constants in 0x41–0x7f, plus all non-constant offsets) | (suppressed) | Conservative fallback used when the offset isn’t a constant or doesn’t slot cleanly. |
Pure expressions
Pure expressions produce values without side effects. The simplifier may freely reorder, deduplicate, and eliminate them. They appear on the right-hand side of a let binding, or as operands of other expressions and effectful statements; the operand positions accept SSA value references only, so any pure expression that is consumed elsewhere is first bound by a let. Examples in this section wrap each expression in a let v := … to give it somewhere to land.
0x<hex>
(Expression::Literal)
Description
A compile-time constant value with a declared type. New literals minted by the translator default to Int(I256); passes that synthesize constants at narrower widths (e.g. a one-bit boolean from a constant comparison) attach the narrower type directly.
Syntax
0x<hex>[: <type>]
Example
let v0: i8 := 0x2a
let v1: i1 := 0x1 // boolean true
let v2: i64 := 0x80
Operands
None — literals are leaves.
Result and purity
| Result | Purity |
|---|---|
Same as the literal’s value_type | Pure |
Annotations
| Source field | Printed as |
|---|---|
value: BigUint | 0x<hex> in the syntax position (not a comment annotation; it is the expression itself) |
value_type: Type | : <type> suffix when value_type is not the default Int(I256); suppressed otherwise |
v<id>
(Expression::Var)
Description
A reference to an existing SSA value, used as the entire right-hand side of a let. In a typical dump this is rare because the simplifier collapses let v := v<id> into the consumers of v via copy propagation; expect to see it only in dumps taken before simplification has run.
Syntax
v<id>
Example
let v5 := v3 // copy; usually eliminated by simplify
Operands
None — the expression is the value reference itself.
Result and purity
| Result | Purity |
|---|---|
| Same as the referenced value’s type | Pure |
Annotations
None.
add
(Expression::Binary with BinaryOperation::Add)
Description
Modular addition. Wraps on overflow; per EVM, the result is (lhs + rhs) mod 2^N where N is the operand width.
Syntax
add($lhs[: <type>], $rhs[: <type>])
Example
let v2 := add(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
widen_by_one(max(width(lhs), width(rhs))) — one tier above the wider operand to account for the carry bit | Pure |
Annotations
None.
sub
(Expression::Binary with BinaryOperation::Sub)
Description
Modular subtraction. Wraps on underflow; the result is (lhs - rhs) mod 2^256 regardless of operand widths.
Syntax
sub($lhs[: <type>], $rhs[: <type>])
Example
let v2 := sub(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
i256 — conservative; underflow on narrower operands could borrow into upper bits | Pure |
Annotations
None.
mul
(Expression::Binary with BinaryOperation::Mul)
Description
Modular multiplication. The result is (lhs * rhs) mod 2^256.
Syntax
mul($lhs[: <type>], $rhs[: <type>])
Example
let v2 := mul(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
double_width(max(width(lhs), width(rhs))) — the tier holding twice the wider operand’s bits (skipping i160 at the i128 → i256 transition) | Pure |
Annotations
None.
div
(Expression::Binary with BinaryOperation::Div)
Description
Unsigned integer division. Per EVM, div(x, 0) = 0 (no trap on division by zero).
Syntax
div($lhs[: <type>], $rhs[: <type>])
Example
let v2 := div(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Dividend. |
rhs | i256 | Divisor; 0 yields a result of 0, not a trap. |
Result and purity
| Result | Purity |
|---|---|
width(lhs) — the quotient cannot exceed the dividend | Pure |
Annotations
None.
sdiv
(Expression::Binary with BinaryOperation::SDiv)
Description
Signed two’s-complement integer division. Per EVM, sdiv(x, 0) = 0; quotient is truncated toward zero.
Syntax
sdiv($lhs[: <type>], $rhs[: <type>])
Example
let v2 := sdiv(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Dividend, treated as signed. |
rhs | i256 | Divisor, treated as signed; 0 yields 0. |
Result and purity
| Result | Purity |
|---|---|
max(width(lhs), width(rhs)) — a negative divisor can push the result to full width | Pure |
Annotations
None.
mod
(Expression::Binary with BinaryOperation::Mod)
Description
Unsigned modulo. Per EVM, mod(x, 0) = 0.
Syntax
mod($lhs[: <type>], $rhs[: <type>])
Example
let v2 := mod(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Dividend. |
rhs | i256 | Divisor; 0 yields 0. |
Result and purity
| Result | Purity |
|---|---|
width(lhs) | Pure |
Annotations
None.
smod
(Expression::Binary with BinaryOperation::SMod)
Description
Signed modulo. Per EVM, smod(x, 0) = 0; the result takes the sign of the dividend.
Syntax
smod($lhs[: <type>], $rhs[: <type>])
Example
let v2 := smod(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Dividend, treated as signed. |
rhs | i256 | Divisor, treated as signed; 0 yields 0. |
Result and purity
| Result | Purity |
|---|---|
width(lhs) | Pure |
Annotations
None.
exp
(Expression::Binary with BinaryOperation::Exp)
Description
Modular exponentiation: (base ^ exponent) mod 2^256. The most expensive arithmetic opcode in EVM (variable gas cost proportional to the byte length of exponent).
Syntax
exp($base[: <type>], $exponent[: <type>])
Example
let v2 := exp(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
base | i256 | Base. |
exponent | i256 | Exponent. |
Result and purity
| Result | Purity |
|---|---|
i256 — conservative; exponentiation can fill any width | Pure |
Annotations
None.
and
(Expression::Binary with BinaryOperation::And)
Description
Bitwise AND. The common idiom for type narrowing: a constant mask on the right lets forward analysis pick up a tight result width.
Syntax
and($lhs[: <type>], $rhs[: <type>])
Example
let v2 := and(v0, v1)
let v3: i8 := 0xff
let v4: i8 := and(v0, v3: i8)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
min(width(lhs), width(rhs)) — AND can only clear bits, so the result fits in the narrower operand | Pure |
Annotations
None.
or
(Expression::Binary with BinaryOperation::Or)
Description
Bitwise OR.
Syntax
or($lhs[: <type>], $rhs[: <type>])
Example
let v2 := or(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
max(width(lhs), width(rhs)) | Pure |
Annotations
None.
xor
(Expression::Binary with BinaryOperation::Xor)
Description
Bitwise XOR.
Syntax
xor($lhs[: <type>], $rhs[: <type>])
Example
let v2 := xor(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
max(width(lhs), width(rhs)) | Pure |
Annotations
None.
shl
(Expression::Binary with BinaryOperation::Shl)
Description
Logical left shift. Operand order follows EVM: shl(shift, value) computes value << shift. Shifts ≥ 256 produce 0.
Syntax
shl($shift[: <type>], $value[: <type>])
Example
let v2 := shl(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
shift | i256 | Shift amount in bits. |
value | i256 | Value to shift. |
Result and purity
| Result | Purity |
|---|---|
i256 — conservative; bits may shift into any width | Pure |
Annotations
None.
shr
(Expression::Binary with BinaryOperation::Shr)
Description
Logical right shift. Operand order follows EVM: shr(shift, value) computes value >> shift with zero-fill from the left. Shifts ≥ 256 produce 0.
Syntax
shr($shift[: <type>], $value[: <type>])
Example
let v2 := shr(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
shift | i256 | Shift amount in bits. |
value | i256 | Value to shift. |
Result and purity
| Result | Purity |
|---|---|
If shift is a known constant k: tier holding 256 - k bits (or i1 for k ≥ 256). Otherwise: width(value). | Pure |
Annotations
None.
sar
(Expression::Binary with BinaryOperation::Sar)
Description
Arithmetic (signed) right shift. Operand order follows EVM: sar(shift, value) shifts value right by shift bits, preserving the sign bit. Shifts ≥ 256 saturate to 0 for non-negative values and to -1 (all-ones) for negative values.
Syntax
sar($shift[: <type>], $value[: <type>])
Example
let v2 := sar(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
shift | i256 | Shift amount in bits. |
value | i256 | Value to shift, treated as signed. |
Result and purity
| Result | Purity |
|---|---|
width(value) — unlike shr, sign-extension means a constant shift cannot narrow the result | Pure |
Annotations
None.
lt
(Expression::Binary with BinaryOperation::Lt)
Description
Unsigned less-than comparison. Returns 1 if lhs < rhs, else 0.
Syntax
lt($lhs[: <type>], $rhs[: <type>])
Example
let v2: i1 := lt(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Compared unsigned. |
rhs | i256 | Compared unsigned. |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
gt
(Expression::Binary with BinaryOperation::Gt)
Description
Unsigned greater-than comparison. Returns 1 if lhs > rhs, else 0.
Syntax
gt($lhs[: <type>], $rhs[: <type>])
Example
let v2: i1 := gt(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Compared unsigned. |
rhs | i256 | Compared unsigned. |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
slt
(Expression::Binary with BinaryOperation::Slt)
Description
Signed less-than comparison. Operands are treated as two’s complement.
Syntax
slt($lhs[: <type>], $rhs[: <type>])
Example
let v2: i1 := slt(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Compared signed. |
rhs | i256 | Compared signed. |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
sgt
(Expression::Binary with BinaryOperation::Sgt)
Description
Signed greater-than comparison. Operands are treated as two’s complement.
Syntax
sgt($lhs[: <type>], $rhs[: <type>])
Example
let v2: i1 := sgt(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | Compared signed. |
rhs | i256 | Compared signed. |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
eq
(Expression::Binary with BinaryOperation::Eq)
Description
Equality comparison. Returns 1 if lhs == rhs, else 0. Signedness is irrelevant.
Syntax
eq($lhs[: <type>], $rhs[: <type>])
Example
let v2: i1 := eq(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
lhs | i256 | — |
rhs | i256 | — |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
byte
(Expression::Binary with BinaryOperation::Byte)
Description
Extract a single byte from a 256-bit word. byte(i, x) returns the i-th byte of x with byte 0 being the most significant. If i ≥ 32, the result is 0.
Syntax
byte($index[: <type>], $word[: <type>])
Example
let v2: i8 := byte(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
index | i256 | Byte position; 0 = most significant byte. Values ≥ 32 yield 0. |
word | i256 | Source word. |
Result and purity
| Result | Purity |
|---|---|
i8 | Pure |
Annotations
None.
signextend
(Expression::Binary with BinaryOperation::SignExtend)
Description
Sign-extend an integer from a byte position. Per EVM, signextend(b, x) treats byte b of x as the most significant byte of a smaller signed integer and extends its sign through the upper bytes.
Syntax
signextend($byte_position[: <type>], $value[: <type>])
Example
let v2 := signextend(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
byte_position | i256 | Byte position of the sign byte (0–31). |
value | i256 | Source value. |
Result and purity
| Result | Purity |
|---|---|
i256 — the extended value occupies the full word | Pure |
Annotations
The width-targeted sign-extension primitive sext<i<bits>> (Expression::SignExtendTo) is a separate operation; see the bit-width conversions section.
addmod
(Expression::Ternary with BinaryOperation::AddMod)
Description
Ternary modular addition: (a + b) mod n, computed without intermediate overflow. Per EVM, n = 0 yields 0.
Syntax
addmod($a[: <type>], $b[: <type>], $n[: <type>])
Example
let v3 := addmod(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
a | i256 | First addend. |
b | i256 | Second addend. |
n | i256 | Modulus; 0 yields 0. |
Result and purity
| Result | Purity |
|---|---|
i256 — conservative | Pure |
Annotations
None.
mulmod
(Expression::Ternary with BinaryOperation::MulMod)
Description
Ternary modular multiplication: (a * b) mod n, computed without intermediate overflow. Per EVM, n = 0 yields 0.
Syntax
mulmod($a[: <type>], $b[: <type>], $n[: <type>])
Example
let v3 := mulmod(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
a | i256 | First factor. |
b | i256 | Second factor. |
n | i256 | Modulus; 0 yields 0. |
Result and purity
| Result | Purity |
|---|---|
i256 — conservative | Pure |
Annotations
None.
iszero
(Expression::Unary with UnaryOperation::IsZero)
Description
Returns 1 if the operand is 0, else 0. Also serves as the logical NOT for boolean values.
Syntax
iszero($operand[: <type>])
Example
let v1: i1 := iszero(v0)
Operands
| Name | Type | Notes |
|---|---|---|
operand | i256 | — |
Result and purity
| Result | Purity |
|---|---|
i1 | Pure |
Annotations
None.
not
(Expression::Unary with UnaryOperation::Not)
Description
Bitwise complement. Inverts every bit; equivalent to xor(operand, 2^256 - 1).
Syntax
not($operand[: <type>])
Example
let v1 := not(v0)
Operands
| Name | Type | Notes |
|---|---|---|
operand | i256 | — |
Result and purity
| Result | Purity |
|---|---|
i256 — the complement fills the full word regardless of operand width | Pure |
Annotations
None.
clz
(Expression::Unary with UnaryOperation::Clz)
Description
Count leading zeros. Returns the number of leading zero bits in the operand, where a value of 0 returns 256 (the full width). Not an EVM opcode; reaches newyork as a Yul builtin (FunctionName::Clz) and is translated directly by the Yul-to-newyork translator.
Syntax
clz($operand[: <type>])
Example
let v1 := clz(v0)
Operands
| Name | Type | Notes |
|---|---|---|
operand | i256 | — |
Result and purity
| Result | Purity |
|---|---|
i256 — in practice the value fits in nine bits (max 256), so type inference often narrows further | Pure |
Annotations
None.
truncate<i<bits>>
(Expression::Truncate)
Description
Reinterpret a wider integer as a narrower one by discarding the upper bits. The destination width is carried in the IR’s to: BitWidth field and is rendered inside the angle brackets of the printer mnemonic. Narrowing-only; the source width must be greater than or equal to the destination width.
Syntax
truncate<i<bits>>($value[: <type>])
Example
let v1: i64 := truncate<i64>(v0)
let v2: i8 := truncate<i8>(v1: i64)
Operands
| Name | Type | Notes |
|---|---|---|
value | i256 | Source value; must be at least as wide as the destination. |
Result and purity
| Result | Purity |
|---|---|
The destination width from the to field | Pure |
Annotations
None. The destination width is part of the operation name, not a debug annotation.
zext<i<bits>>
(Expression::ZeroExtend)
Description
Reinterpret a narrower integer as a wider one by zero-filling the upper bits. The destination width is carried in the IR’s to: BitWidth field. Widening-only.
Syntax
zext<i<bits>>($value[: <type>])
Example
let v1 := zext<i256>(v0: i8)
Operands
| Name | Type | Notes |
|---|---|---|
value | i256 | Source value; must be no wider than the destination. |
Result and purity
| Result | Purity |
|---|---|
The destination width from the to field | Pure |
Annotations
None.
sext<i<bits>>
(Expression::SignExtendTo)
Description
Reinterpret a narrower signed integer as a wider one by sign-extending the high bit. The destination width is carried in the IR’s to: BitWidth field. Distinct from signextend (Expression::Binary), which is the EVM byte-position primitive; this one specifies the destination width directly and is introduced by passes that produce a sign-extended value at a known target width.
Syntax
sext<i<bits>>($value[: <type>])
Example
let v1 := sext<i256>(v0: i64)
Operands
| Name | Type | Notes |
|---|---|---|
value | i256 | Source value; must be no wider than the destination. |
Result and purity
| Result | Purity |
|---|---|
The destination width from the to field | Pure |
Annotations
None.
keccak256
(Expression::Keccak256)
Description
Compute the Keccak-256 hash of length bytes of emulated EVM linear memory starting at offset. The general-purpose hashing primitive; the specialized variants below cover the common scratch-space patterns more compactly.
Syntax
keccak256($offset[: <type>], $length[: <type>])
Example
let v2 := keccak256(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Byte offset into linear memory; forward analysis widens to at least i64. |
length | i256 | Length of the region to hash, in bytes; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure — the hash is a deterministic function of the memory contents at evaluation time. Passes that hoist or dedupe must respect intervening memory writes. |
Annotations
None.
keccak256_pair
(Expression::Keccak256Pair)
Description
Compound hash of two 256-bit words. Equivalent to mstore(0, word0); mstore(32, word1); keccak256(0, 64) but emitted as a single outlined call after mem_opt’s keccak fusion recognizes the pattern. The mapping-key idiom; see also mapping_sload.
Syntax
keccak256_pair($word0[: <type>], $word1[: <type>])
Example
let v2 := keccak256_pair(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
word0 | i256 | First word; the high 32 bytes of the hash input. |
word1 | i256 | Second word; the low 32 bytes of the hash input. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
keccak256_single
(Expression::Keccak256Single)
Description
Compound hash of a single 256-bit word. Equivalent to mstore(0, word0); keccak256(0, 32) but emitted as a single outlined call after mem_opt’s keccak fusion.
Syntax
keccak256_single($word0[: <type>])
Example
let v1 := keccak256_single(v0)
Operands
| Name | Type | Notes |
|---|---|---|
word0 | i256 | The word to hash. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
caller
(Expression::Caller)
Description
Address of the immediate caller of the current call frame.
Syntax
caller()
Example
let v0: i160 := caller()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i160 | Pure |
Annotations
None.
callvalue
(Expression::CallValue)
Description
Value (wei) attached to the current call.
Syntax
callvalue()
Example
let v0 := callvalue()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
origin
(Expression::Origin)
Description
Address of the original externally owned account that initiated the transaction.
Syntax
origin()
Example
let v0: i160 := origin()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i160 | Pure |
Annotations
None.
address
(Expression::Address)
Description
Address of the contract executing the current call frame.
Syntax
address()
Example
let v0: i160 := address()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i160 | Pure |
Annotations
None.
chainid
(Expression::ChainId)
Description
Chain identifier of the network the contract is executing on.
Syntax
chainid()
Example
let v0 := chainid()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
gas
(Expression::Gas)
Description
Remaining gas at the point of evaluation. Modeled as a pure expression for IR purposes; in practice it changes between evaluations, so any simplifier that deduplicates pure expressions must respect gas as a barrier.
Syntax
gas()
Example
let v0: i64 := gas()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure (per IR; see Description) |
Annotations
None.
msize
(Expression::MSize)
Description
Highest byte offset of emulated EVM linear memory that has been touched, rounded up to the next 32-byte boundary. Unlike gas, classified as side-effectful by the simplifier: unused msize() bindings are not eliminated, because the result depends on the program’s memory-access history and would change if the surrounding statements were reordered.
Syntax
msize()
Example
let v0: i64 := msize()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Effectful (see Description) |
Annotations
None.
coinbase
(Expression::Coinbase)
Description
Address of the block’s coinbase (block author).
Syntax
coinbase()
Example
let v0: i160 := coinbase()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i160 | Pure |
Annotations
None.
timestamp
(Expression::Timestamp)
Description
Block timestamp, as a Unix epoch second.
Syntax
timestamp()
Example
let v0: i64 := timestamp()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
number
(Expression::Number)
Description
Current block number.
Syntax
number()
Example
let v0: i64 := number()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
difficulty
(Expression::Difficulty)
Description
Pre-merge block difficulty. On post-merge chains this is the block’s prevrandao value.
Syntax
difficulty()
Example
let v0 := difficulty()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
gaslimit
(Expression::GasLimit)
Description
Block gas limit.
Syntax
gaslimit()
Example
let v0: i64 := gaslimit()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
basefee
(Expression::BaseFee)
Description
Current block’s EIP-1559 base fee per gas.
Syntax
basefee()
Example
let v0 := basefee()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
blobbasefee
(Expression::BlobBaseFee)
Description
Current block’s EIP-4844 blob base fee per gas.
Syntax
blobbasefee()
Example
let v0 := blobbasefee()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
blobhash
(Expression::BlobHash)
Description
Versioned hash of the blob at the given index in the current transaction’s blob list.
Syntax
blobhash($index[: <type>])
Example
let v1 := blobhash(v0)
Operands
| Name | Type | Notes |
|---|---|---|
index | i256 | Blob index; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
blockhash
(Expression::BlockHash)
Description
Hash of the block with the given number. Per EVM, valid only for the most recent 256 blocks; outside that range the result is 0.
Syntax
blockhash($number[: <type>])
Example
let v1 := blockhash(v0)
Operands
| Name | Type | Notes |
|---|---|---|
number | i256 | Block number; forward analysis widens to i256. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
selfbalance
(Expression::SelfBalance)
Description
Balance (in wei) of the contract executing the current call frame. Cheaper than balance(address()).
Syntax
selfbalance()
Example
let v0 := selfbalance()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
gasprice
(Expression::GasPrice)
Description
Effective gas price of the current transaction.
Syntax
gasprice()
Example
let v0 := gasprice()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
calldataload
(Expression::CallDataLoad)
Description
Read 32 bytes from the current call’s calldata at the given offset. Reads past the end of calldata return zero bytes.
Syntax
calldataload($offset[: <type>])
Example
let v1 := calldataload(v0)
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Byte offset into calldata. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
calldatasize
(Expression::CallDataSize)
Description
Length of the current call’s calldata, in bytes.
Syntax
calldatasize()
Example
let v0: i64 := calldatasize()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
returndatasize
(Expression::ReturnDataSize)
Description
Length of the most recently returned data buffer from a sub-call, in bytes. Modeled as pure per IR but reflects the last ExternalCall / Create result; consumers must respect that ordering.
Syntax
returndatasize()
Example
let v0: i64 := returndatasize()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure (per IR; see Description) |
Annotations
None.
codesize
(Expression::CodeSize)
Description
Size of the currently executing code, in bytes.
Syntax
codesize()
Example
let v0: i64 := codesize()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
extcodesize
(Expression::ExtCodeSize)
Description
Size of the code deployed at the given address, in bytes. Returns 0 for accounts with no deployed code.
Syntax
extcodesize($address[: <type>])
Example
let v1: i64 := extcodesize(v0: i160)
Operands
| Name | Type | Notes |
|---|---|---|
address | i256 | Account address; forward analysis widens to at least i160. |
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
None.
extcodehash
(Expression::ExtCodeHash)
Description
Keccak-256 hash of the code deployed at the given address. Returns 0 for non-existent accounts.
Syntax
extcodehash($address[: <type>])
Example
let v1 := extcodehash(v0: i160)
Operands
| Name | Type | Notes |
|---|---|---|
address | i256 | Account address; forward analysis widens to at least i160. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
balance
(Expression::Balance)
Description
Balance (in wei) of the given account address. Use selfbalance for the contract executing the current call frame (cheaper).
Syntax
balance($address[: <type>])
Example
let v1 := balance(v0: i160)
Operands
| Name | Type | Notes |
|---|---|---|
address | i256 | Account address; forward analysis widens to at least i160. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
None.
mload
(Expression::MLoad)
Description
Read a 32-byte word from emulated EVM linear memory at offset. The word is read big-endian per EVM semantics. Pure per IR, but reads after writes return the new value; the memory passes track read/write dependencies separately.
Syntax
mload($offset[: <type>]) [/* <region> */]
Example
let v1 := mload(v0: i64)
let v2: i32 := mload(v3: i64) /* free_ptr */
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Byte offset into linear memory; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
i32 when region is FreePointerSlot; i256 otherwise | Pure (per IR; see Description) |
Annotations
| Source field | Printed as |
|---|---|
region: MemoryRegion | /* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed) |
Same tagging rules as mstore. The region also determines the result width: a load from FreePointerSlot produces an i32 since the FMP fits in a pointer-sized word.
sload
(Expression::SLoad)
Description
Read a 32-byte word from persistent contract storage at the given key. Pure per IR; reads after writes to the same slot return the new value.
Syntax
sload($key[: <type>]) [/* slot: 0x<hex> */]
Example
let v1 := sload(v0)
let v2 := sload(v3) /* slot: 0x0 */
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Storage slot. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure (per IR; see Description) |
Annotations
| Source field | Printed as |
|---|---|
static_slot: Option<BigUint> | /* slot: 0x<hex> */ when set; suppressed otherwise |
Same tagging rules as sstore. The printer renders the annotation whenever the field is Some and the deduplicator’s canonicalizer partitions signatures by slot; no pass currently writes Some(...), however, so in present-day dumps the annotation is dormant.
tload
(Expression::TLoad)
Description
Read a 32-byte word from transient storage at the given key. Transient storage is wiped at the end of the transaction; pair with tstore.
Syntax
tload($key[: <type>])
Example
let v1 := tload(v0)
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Transient storage slot. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure (per IR; see Description) |
Annotations
None. The IR does not track a static slot for tload.
mapping_sload
(Expression::MappingSLoad)
Description
Compound load for a Solidity mapping element. Equivalent to mstore(0, key); mstore(32, slot); sload(keccak256(0, 64)) but emitted as a single outlined call after the mapping_access_outlining pass recognizes the pattern (it fuses a keccak256_pair — itself produced by mem_opt’s keccak fusion — followed by an sload whose key has a single consumer). Only valid when the intermediate hash is used exclusively by this load.
Syntax
mapping_sload($key[: <type>], $slot[: <type>])
Example
let v2 := mapping_sload(v0: i160, v1)
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Mapping key; often narrowed to i160 for address keys. |
slot | i256 | The mapping’s declared storage slot. |
Result and purity
| Result | Purity |
|---|---|
i256 | Pure (per IR; see Description) |
Annotations
None. The fused statement’s effective storage slot is the keccak hash of the key and the declared slot, which is never a compile-time constant; no static_slot hint is surfaced.
dataoffset
(Expression::DataOffset)
Description
Offset of a named data segment within the deployed code. The identifier is a string carried in the IR’s id: String field; the linker resolves it to a concrete offset.
Syntax
dataoffset("<id>")
Example
let v0 := dataoffset("MyContract_deployed")
Operands
None — the identifier is a quoted string literal in the syntax position, not an operand.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
| Source field | Printed as |
|---|---|
id: String | The quoted identifier in the syntax position (not a comment annotation; it is the expression itself). |
datasize
(Expression::DataSize)
Description
Size of a named data segment within the deployed code, in bytes. The identifier is resolved by the linker.
Syntax
datasize("<id>")
Example
let v0: i64 := datasize("MyContract_deployed")
Operands
None — the identifier is a quoted string literal in the syntax position, not an operand.
Result and purity
| Result | Purity |
|---|---|
i64 | Pure |
Annotations
| Source field | Printed as |
|---|---|
id: String | The quoted identifier in the syntax position. |
loadimmutable
(Expression::LoadImmutable)
Description
Read the value of a named immutable variable. Immutables are written once during contract construction by SetImmutable and read afterwards via this expression.
Syntax
loadimmutable("<key>")
Example
let v0 := loadimmutable("MyContract.owner")
Operands
None — the key is a quoted string literal in the syntax position.
Result and purity
| Result | Purity |
|---|---|
i256 | Pure |
Annotations
| Source field | Printed as |
|---|---|
key: String | The quoted identifier in the syntax position. |
linkersymbol
(Expression::LinkerSymbol)
Description
Address of an external library, resolved by the linker. The path encodes the library’s source location and identifier.
Syntax
linkersymbol("<path>")
Example
let v0: i160 := linkersymbol("contracts/Library.sol:L")
Operands
None — the path is a quoted string literal in the syntax position.
Result and purity
| Result | Purity |
|---|---|
i160 | Pure |
Annotations
| Source field | Printed as |
|---|---|
path: String | The quoted path in the syntax position. |
<func_name>
(Expression::Call; the printer emits func_<id> when no function name is registered)
Description
Internal function call. Invokes a user-defined function declared earlier in the same object; the mnemonic is the function’s Yul-level name, or func_<id> if the printer has no name registered for the FunctionId. Distinct from call and the other EVM call-opcode statements, which cross the contract boundary.
Syntax
<func_name>([$argument_0[: <type>], $argument_1[: <type>], …])
Example
let v3 := abi_decode_uint256(v0, v1, v2)
let v4, v5 := returns_two(v0) // multi-return via let multi-binding
Operands
| Name | Type | Notes |
|---|---|---|
arguments | Vec<Value> | Zero or more argument values, in declaration order; each operand may carry a : <type> suffix. |
Result and purity
| Result | Purity |
|---|---|
One or more values, widths taken from the callee’s declared return types (or the inferred return widths, narrowed via the interprocedural pass). Falls back to i256 when the callee’s returns are unknown to type inference. | Effectful — the simplifier treats every call as side-effectful regardless of callee body, so unused call bindings are not DCE’d. The transitive purity of the callee is not tracked at the IR level. |
Annotations
| Source field | Printed as |
|---|---|
function: FunctionId | The callee’s name in the syntax position (or func_<id> if the printer has no name registered). |
Memory and storage writes
The operations in this section all modify external state: emulated EVM linear memory, persistent storage, or transient storage. They are statements (not expressions) and they are never pure. Simplification and deduplication never reorder them with respect to each other or with respect to reverts; the memory passes treat them as the side-effect boundary for their analyses.
mstore
(Statement::MStore)
Description
Write a 32-byte word to emulated EVM linear memory at offset. The word is stored big-endian, matching EVM semantics; the codegen handles the byte swap on PolkaVM’s little-endian RISC-V target.
Syntax
mstore($offset[: <type>], $value[: <type>]) [/* <region> */]
Example
mstore(v0, v1) // Unknown region; no annotation printed
mstore(v2, v3) /* scratch */ // offset proven to land in 0x00..0x3f
mstore(v4, v5) /* free_ptr */ // offset is exactly 0x40
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Byte offset into linear memory; forward analysis widens to at least i64. |
value | i256 | The 32-byte word to store. Narrower values are zero-extended at codegen time. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
| Source field | Printed as |
|---|---|
region: MemoryRegion | /* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed) |
Assigned at translation time from the constant offset (if any); consumed by mem_opt, FMP propagation, and byte-swap mode selection.
mstore8
(Statement::MStore8)
Description
Write a single byte to emulated EVM linear memory at offset. The low 8 bits of value are stored; the upper bits are ignored. The operation is otherwise identical to mstore: same operand shape, same region tag, same side-effect classification.
Syntax
mstore8($offset[: <type>], $value[: <type>]) [/* <region> */]
Example
mstore8(v0, v1: i8)
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Byte offset into linear memory; forward analysis widens to at least i64. |
value | i256 | Only the low 8 bits are stored. Often narrowed to i8 by type inference. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
| Source field | Printed as |
|---|---|
region: MemoryRegion | /* scratch */ · /* free_ptr */ · /* dynamic */ (Unknown is suppressed) |
Same tagging rules as mstore. Most mstore8s carry an Unknown region in practice because single-byte writes typically target offsets the translator cannot prove constant.
mcopy
(Statement::MCopy)
Description
Copy length bytes from src to dest within emulated EVM linear memory. The Yul builtin mcopy maps directly onto this statement; unlike mstore, it does not carry a region tag because the source and destination ranges may straddle multiple regions.
Syntax
mcopy($dest[: <type>], $src[: <type>], $length[: <type>])
Example
mcopy(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
dest | i256 | Destination byte offset in linear memory. |
src | i256 | Source byte offset in linear memory. |
length | i256 | Number of bytes to copy. Overlapping ranges follow EVM-defined memmove semantics. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None. mcopy carries no region tag because the source and destination ranges may straddle multiple regions, and no static-slot hint because the copy is not storage-bound.
sstore
(Statement::SStore)
Description
Write a 32-byte word to persistent contract storage at key. The operation is the durable counterpart of mstore: the value survives across transactions and is observable to subsequent calls to the contract.
Syntax
sstore($key[: <type>], $value[: <type>]) [/* slot: 0x<hex> */]
Example
sstore(v0, v1)
sstore(v2, v3) /* slot: 0x0 */
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Storage slot. May be a constant slot, a keccak-derived slot for mappings or dynamic arrays, or an arbitrary expression. |
value | i256 | The 256-bit word to store. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
| Source field | Printed as |
|---|---|
static_slot: Option<BigUint> | /* slot: 0x<hex> */ when set; suppressed otherwise |
The printer renders the annotation whenever the field is Some, and the deduplicator’s canonicalizer and mapping-fusion analyses consume it as part of the signature. No pass currently writes Some(...), so the annotation is dormant in present-day dumps; when absent, alias and dedup analyses fall back to the conservative “may alias any slot” assumption.
tstore
(Statement::TStore)
Description
Write a 32-byte word to transient storage at key. Transient storage is wiped at the end of the transaction, so tstore is the right primitive for per-transaction bookkeeping (reentrancy guards, cached results) without the gas cost of sstore on EVM. On PolkaVM the transient backing store is provided by pallet-revive.
Syntax
tstore($key[: <type>], $value[: <type>])
Example
tstore(v0, v1)
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Transient storage slot. |
value | i256 | The 256-bit word to store. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None. Unlike sstore, the IR does not track a static slot for tstore: transient storage’s short-lived lifetime makes the slot-aware optimizations less valuable, and the translator does not produce the annotation.
mapping_sstore
(Statement::MappingSStore)
Description
Compound store for a Solidity mapping element. Equivalent to mstore(0, key); mstore(32, slot); sstore(keccak256(0, 64), value) but emitted as a single outlined statement after the mapping_access_outlining pass recognizes the pattern (it fuses a keccak256_pair followed by an sstore whose key has a single consumer). Only valid when the intermediate hash is not observed by any other statement.
Syntax
mapping_sstore($key[: <type>], $slot[: <type>], $value[: <type>])
Example
mapping_sstore(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
key | i256 | Mapping key. The outlining pass force-widens it to i256, so it always prints at full width, even for address keys. |
slot | i256 | The mapping’s declared storage slot. Typically a small constant. |
value | i256 | The value to store at the computed storage location. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None. mapping_sstore deliberately drops the static_slot annotation that the original sstore may have carried, because the fused statement’s effective slot is the keccak hash of the key and the declared slot, which is never a compile-time constant.
Bulk copies
Multi-byte memory copies from the EVM-accessible byte sources (code, external code, returndata, embedded data, and calldata) into emulated EVM linear memory. They all take the same shape: a destination memory offset, a source offset, and a length. They are effectful and act as opaque barriers to the memory passes.
codecopy
(Statement::CodeCopy)
Description
Copy length bytes from the currently executing code at offset into emulated EVM linear memory at dest. Reads past the end of code yield zero bytes.
Syntax
codecopy($dest[: <type>], $offset[: <type>], $length[: <type>])
Example
codecopy(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
dest | i256 | Destination byte offset in linear memory. |
offset | i256 | Source byte offset in the executing code. |
length | i256 | Number of bytes to copy. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None.
extcodecopy
(Statement::ExtCodeCopy)
Description
Copy length bytes from the code at address starting at offset into emulated EVM linear memory at dest. Reads beyond the code yield zero bytes; non-existent accounts yield all zeros.
Syntax
extcodecopy($address[: <type>], $dest[: <type>], $offset[: <type>], $length[: <type>])
Example
extcodecopy(v0: i160, v1, v2, v3)
Operands
| Name | Type | Notes |
|---|---|---|
address | i256 | Account whose code to read; forward analysis widens to at least i160. |
dest | i256 | Destination byte offset in linear memory. |
offset | i256 | Source byte offset in the external code. |
length | i256 | Number of bytes to copy. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None.
returndatacopy
(Statement::ReturnDataCopy)
Description
Copy length bytes from the most recent sub-call’s return data starting at offset into emulated EVM linear memory at dest. Per EVM, reads past the return data’s end revert; the memory passes treat this as a potential trap site.
Syntax
returndatacopy($dest[: <type>], $offset[: <type>], $length[: <type>])
Example
returndatacopy(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
dest | i256 | Destination byte offset in linear memory. |
offset | i256 | Source byte offset in the return-data buffer. |
length | i256 | Number of bytes to copy. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (may revert on out-of-range reads, per EVM) |
Annotations
None.
datacopy
(Statement::DataCopy)
Description
Copy length bytes from an embedded data segment starting at offset into emulated EVM linear memory at dest. The source segment is resolved by the linker, typically used to pull constants compiled into the bytecode into runtime memory.
Syntax
datacopy($dest[: <type>], $offset[: <type>], $length[: <type>])
Example
datacopy(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
dest | i256 | Destination byte offset in linear memory. |
offset | i256 | Source byte offset in the data segment. |
length | i256 | Number of bytes to copy. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None.
calldatacopy
(Statement::CallDataCopy)
Description
Copy length bytes from the current call’s calldata starting at offset into emulated EVM linear memory at dest. Reads past the end of calldata yield zero bytes.
Syntax
calldatacopy($dest[: <type>], $offset[: <type>], $length[: <type>])
Example
calldatacopy(v0, v1, v2)
Operands
| Name | Type | Notes |
|---|---|---|
dest | i256 | Destination byte offset in linear memory. |
offset | i256 | Source byte offset in calldata. |
length | i256 | Number of bytes to copy. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None.
Bindings and wrappers
The statements that bind SSA values, hold loose expressions evaluated for their side effects, and write to immutable storage. Every pure expression in this reference’s earlier sections appears on the right-hand side of one of these statements (almost always let).
let
(Statement::Let)
Description
SSA binding: evaluate an expression and bind its result(s) to a list of fresh value ids. The let statement is the only mechanism by which pure expressions enter the value namespace; every v<id> in a dump was produced by a let (or by a value-yielding control-flow statement or by a parameter at function entry).
Syntax
let $binding_0[, $binding_1, …] := $expression
Example
let v3 := add(v0, v1)
let v4, v5 := if v2 [v0, v1] { … } else { … } // multi-binding from a value-yielding If
Operands
| Name | Type | Notes |
|---|---|---|
bindings | Vec<ValueId> | One or more fresh SSA ids to bind. Most expressions produce one value; control-flow statements may produce several. |
value | Expression | The right-hand side; see any of the Pure expression entries. |
Result and purity
| Result | Purity |
|---|---|
| None directly — the bound ids carry the expression’s result(s) | Effectful (binding establishment); the right-hand side’s purity is independent |
Annotations
None.
Expression statement
(Statement::Expression)
Description
Wraps an expression evaluated for its observable consequences but whose value is not bound. Typically a zero-return (void) user-defined function call (Expression::Call) evaluated for its side effects, or the discarded void result of a Yul builtin used as a statement (a value-producing expression is bound by a let instead). EVM external calls (call, delegatecall, etc.) and contract creation (create, create2) translate to dedicated Statement::ExternalCall and Statement::Create variants, not through this wrapper.
Syntax
$expression
Example
update_balance(v0) // void function called for its side effects
Operands
| Name | Type | Notes |
|---|---|---|
expression | Expression | Any expression; result is discarded. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (per its statement position) |
Annotations
None.
setimmutable
(Statement::SetImmutable)
Description
Write an immutable variable during contract construction. Immutables are written once in the constructor and read later via loadimmutable. The key is a string identifier resolved by the linker.
Syntax
setimmutable("<key>", $value[: <type>])
Example
setimmutable("MyContract.owner", v0)
Operands
| Name | Type | Notes |
|---|---|---|
value | i256 | The value to store; the key is a quoted string literal in the syntax position. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
| Source field | Printed as |
|---|---|
key: String | The quoted identifier in the syntax position. |
Structured control flow
The IR’s control flow is structured: if, switch, and for are statements with explicit nested regions, each carrying input values and yielding output values. The jump-like statements (break, continue, leave) are scoped to their nearest enclosing construct. Nested blocks create lexical scope without otherwise changing control flow.
if
(Statement::If)
Description
Conditional execution with optional value yields. The then region runs when condition is non-zero; the else region runs otherwise. If outputs is non-empty, both regions must yield the same number of values and the statement is bound by a let.
Syntax
if $condition[: <type>] [[$input_0, $input_1, …]] { … } [else { … }]
Example
if v0: i1 {
sstore(v1, v2)
}
let v5, v6 := if v3: i1 [v1, v2] {
let v7: i64 := 0x1 // add widens its operands to the i64 register width
let v8 := add(v2, v7: i64)
yield v1, v8
} else {
yield v1, v2
}
Operands
| Name | Type | Notes |
|---|---|---|
condition | i256 | Branch selector; non-zero takes the then region. Often narrowed to i1. |
inputs | Vec<Value> | Values threaded into both regions, printed in square brackets after the condition. |
| (regions) | — | The then_region is mandatory; the else_region is optional and, when absent, implicitly yields the inputs unchanged. |
Result and purity
| Result | Purity |
|---|---|
None for the statement form; for the value-yielding form, one value per outputs binding, types taken from the yielded values | Effectful (control flow) |
Annotations
None.
switch
(Statement::Switch)
Description
Multi-way dispatch on a scrutinee value. Each case matches a specific constant and runs its region; an optional default region catches non-matching values. Like if, switch may yield values via outputs and accept thread-through values via inputs.
Syntax
switch $scrutinee[: <type>] [[$input_0, …]]
case 0x<hex> {
…
}
[case 0x<hex> {
…
} …]
[default {
…
}]
Example
switch v0
case 0x0 {
sstore(v1, v2)
}
case 0x1 {
sstore(v1, v3)
}
default {
invalid()
}
Operands
| Name | Type | Notes |
|---|---|---|
scrutinee | i256 | The value to compare against each case. |
inputs | Vec<Value> | Values threaded into every case and default region. |
cases | Vec<SwitchCase> | Each case carries a constant value: BigUint and a region. |
| (default) | — | Optional fall-through region. |
Result and purity
| Result | Purity |
|---|---|
None for the statement form; one value per outputs binding for the value-yielding form | Effectful (control flow) |
Annotations
None.
for
(Statement::For)
Description
Structured loop with explicit loop-carried variables. Each iteration evaluates condition_statements followed by condition; if the condition is non-zero, the body region runs, then the post region runs, and the loop iterates. Loop-carried variables are passed as SSA values through each region. break exits the loop and continue jumps to the post region.
Syntax
for { $variable_0 := $initial_0[, …] }
[// condition statements:
…]
condition: $condition
post [($post_input_variable_0[, …])] {
…
}
body {
… body …
}
Example
let v0: i1 := 0x0
let v6 := for { v1 := v0: i1 }
// condition statements:
let v2: i8 := 0xa
condition: lt(v1, v2: i8)
post (v3) {
let v4: i64 := 0x1
let v5 := add(v3, v4: i64)
yield v5
}
body {
sstore(v1, v1)
0x0: void
yield v1
}
Operands
| Name | Type | Notes |
|---|---|---|
initial_values | Vec<Value> | Starting values for the loop-carried variables. |
loop_variables | Vec<ValueId> | SSA ids visible inside condition, body, and post. |
condition_statements | Vec<Statement> | Statements evaluated each iteration before the condition expression; emitted into the loop header block. Printed only when non-empty, behind a // condition statements: comment. |
condition | Expression | Re-evaluated each iteration; non-zero continues, zero exits. |
body | Region | Loop body; yields current loop-carried values. |
post_input_variables | Vec<ValueId> | Input SSA ids for the post region (one per loop-carried variable); receive the body’s yielded values merged with continue-site values via phi nodes in the LLVM codegen. |
post | Region | Runs after each body iteration (and after continue); yields updated loop-carried values. |
outputs | Vec<ValueId> | Final loop-carried values after exit. |
Result and purity
| Result | Purity |
|---|---|
None for the statement form; one value per outputs binding for the value-yielding form | Effectful (control flow) |
Annotations
None.
break
(Statement::Break)
Description
Exit the innermost enclosing for loop. Carries the current values of loop-carried variables at the break point; these become the loop’s outputs.
Syntax
break
Example
if v0 { break [v1, v2] }
Operands
The loop-carried values: Vec<Value> print in brackets when non-empty (e.g. break [v1, v2]).
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (control flow) |
Annotations
None.
continue
(Statement::Continue)
Description
Skip to the post region of the innermost enclosing for loop. Like break, carries the current values of loop-carried variables internally.
Syntax
continue
Example
if v0 { continue [v1, v2] }
Operands
The loop-carried values print in brackets when non-empty (e.g. continue [v1, v2]).
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (control flow) |
Annotations
None.
leave
(Statement::Leave)
Description
Exit the current function, returning the listed values as the function’s return values. The Yul-level leave keyword translates directly to this statement; the inlining pass eliminates intra-function leaves where possible via the exit-flag transformation.
Syntax
leave [[$value_0[: <type>], $value_1[: <type>], …]]
Example
leave [v0, v1] // returns v0 and v1 from the function
leave // returns nothing (void function)
Operands
| Name | Type | Notes |
|---|---|---|
return_values | Vec<Value> | Empty for void functions; otherwise one entry per declared return. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (control flow) |
Annotations
None.
Nested block
(Statement::Block)
Description
A lexical scope without conditional or iterative behavior. The body is a region; control falls through after the region’s statements complete. Used to bound the visibility of inner bindings.
Syntax
{
…
}
Example
{
let v0 := add(v1, v2)
sstore(v3, v0)
} // v0 is no longer in scope here
Operands
None — the body is a region, not an operand.
Result and purity
| Result | Purity |
|---|---|
| None | Effectful (per the body’s contents) |
Annotations
None.
External interaction
Statements that cross the contract boundary: external calls, contract creation, and event log emission. All produce or rely on external state and act as barriers to memory and storage analyses.
call
(Statement::ExternalCall with CallKind::Call)
Description
Standard external call that may transfer value. Reads args_length bytes from emulated EVM linear memory at args_offset as calldata, executes the target, and writes up to ret_length bytes of return data into linear memory at ret_offset. The boolean result indicates success.
Syntax
let $result := call($gas[: <type>], $address[: <type>], $value[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])
Example
let v8 := call(v0: i64, v1: i160, v2, v3: i64, v4: i64, v5: i64, v6: i64)
Operands
| Name | Type | Notes |
|---|---|---|
gas | i256 | Gas to forward to the target; forward analysis widens to at least i64. |
address | i256 | Callee address; forward analysis widens to at least i160. |
value | i256 | Wei to transfer with the call. |
args_offset | i256 | Calldata source offset in linear memory; forward analysis widens to at least i64. |
args_length | i256 | Calldata length in bytes; forward analysis widens to at least i64. |
ret_offset | i256 | Return-data destination offset in linear memory; forward analysis widens to at least i64. |
ret_length | i256 | Maximum return-data length; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
i256 (success flag: 1 on success, 0 on revert/error; narrowable to i1) | Effectful |
Annotations
None.
callcode
(Statement::ExternalCall with CallKind::CallCode)
Description
Deprecated EVM opcode that executes the callee’s code in the caller’s context but with the callee’s storage. Not supported by the newyork backend (codegen rejects it); use delegatecall instead.
Syntax
let $result := callcode($gas[: <type>], $address[: <type>], $value[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])
Example
let v8 := callcode(v0: i64, v1: i160, v2, v3: i64, v4: i64, v5: i64, v6: i64)
Operands
Same shape as call.
Result and purity
| Result | Purity |
|---|---|
i256 (success flag; narrowable to i1) | Effectful |
Annotations
None.
delegatecall
(Statement::ExternalCall with CallKind::DelegateCall)
Description
Execute the callee’s code in the caller’s context: same storage, same sender, same call value. The standard mechanism for library calls and proxy patterns. No value operand (the caller’s call value is inherited).
Syntax
let $result := delegatecall($gas[: <type>], $address[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])
Example
let v7 := delegatecall(v0: i64, v1: i160, v2: i64, v3: i64, v4: i64, v5: i64)
Operands
Same shape as call minus the value operand.
Result and purity
| Result | Purity |
|---|---|
i256 (success flag; narrowable to i1) | Effectful |
Annotations
None.
staticcall
(Statement::ExternalCall with CallKind::StaticCall)
Description
Read-only external call. Any state modification in the callee (including nested calls) causes the call to revert. No value operand.
Syntax
let $result := staticcall($gas[: <type>], $address[: <type>], $args_offset[: <type>], $args_length[: <type>], $ret_offset[: <type>], $ret_length[: <type>])
Example
let v7 := staticcall(v0: i64, v1: i160, v2: i64, v3: i64, v4: i64, v5: i64)
Operands
Same shape as call minus the value operand.
Result and purity
| Result | Purity |
|---|---|
i256 (success flag; narrowable to i1) | Effectful (no state writes, but still an external boundary and may revert) |
Annotations
None.
create
(Statement::Create with CreateKind::Create)
Description
Deploy a new contract with the given init-code bytes, transferring value wei from the caller. The new contract’s address is derived from the caller’s address and nonce; on failure the result is 0.
Syntax
let $result := create($value[: <type>], $offset[: <type>], $length[: <type>])
Example
let v4 := create(v0, v1: i64, v2: i64)
Operands
| Name | Type | Notes |
|---|---|---|
value | i256 | Wei to transfer to the new contract. |
offset | i256 | Linear-memory offset of the init code; forward analysis widens to at least i64. |
length | i256 | Length of the init code in bytes; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
i256 (created address; narrowable to i160 on success, 0 on failure) | Effectful |
Annotations
None.
create2
(Statement::Create with CreateKind::Create2)
Description
Deploy a new contract with a deterministic address derived from the caller’s address, the salt, and the init-code hash. Same operand shape as create plus an additional salt.
Syntax
let $result := create2($value[: <type>], $offset[: <type>], $length[: <type>], $salt[: <type>])
Example
let v5 := create2(v0, v1: i64, v2: i64, v3)
Operands
Same as create plus salt: i256.
Result and purity
| Result | Purity |
|---|---|
i256 (created address; narrowable to i160 on success, 0 on failure) | Effectful |
Annotations
None.
log<N>
(Statement::Log)
Description
Emit an event log entry. The mnemonic suffix <N> is the number of indexed topics (0 through 4), determined by the length of the IR’s topics field. The data portion is read from length bytes of emulated EVM linear memory at offset.
Syntax
log<N>($offset[: <type>], $length[: <type>][, $topic_0[: <type>], …])
Example
log0(v0: i64, v1: i64) // no topics
log2(v0: i64, v1: i64, v2, v3) // two topics
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Data source offset in linear memory; forward analysis widens to at least i64. |
length | i256 | Data length in bytes; forward analysis widens to at least i64. |
topics | Vec<Value> | Zero to four indexed topic values; the length determines the mnemonic suffix. |
Result and purity
| Result | Purity |
|---|---|
| None | Effectful |
Annotations
None.
Termination
Statements that end the current call frame. Plain forms (return, revert, stop), unconditional traps (invalid, selfdestruct), and outlined revert variants (panic_revert, error_string_revert, custom_error_revert) that encode common Solidity error patterns into single nodes that can be deduplicated across call sites.
return
(Statement::Return)
Description
End the current call frame successfully, returning length bytes from emulated EVM linear memory at offset as the return data.
Syntax
return($offset[: <type>], $length[: <type>])
Example
return(v0: i64, v1: i64)
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Return-data source offset; forward analysis widens to at least i64. |
length | i256 | Return-data length; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
None.
revert
(Statement::Revert)
Description
End the current call frame with a revert, undoing all state changes made during the call, and returning length bytes of revert data from emulated EVM linear memory at offset.
Syntax
revert($offset[: <type>], $length[: <type>])
Example
revert(v0: i64, v1: i64)
Operands
| Name | Type | Notes |
|---|---|---|
offset | i256 | Revert-data source offset; forward analysis widens to at least i64. |
length | i256 | Revert-data length; forward analysis widens to at least i64. |
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
None.
stop
(Statement::Stop)
Description
End the current call frame successfully with empty return data.
Syntax
stop()
Example
stop()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
None.
invalid
(Statement::Invalid)
Description
Unconditional invalid-opcode trap. Consumes all remaining gas and reverts. Used for unreachable branches and assertion failures.
Syntax
invalid()
Example
invalid()
Operands
None.
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
None.
selfdestruct
(Statement::SelfDestruct)
Description
End the current call frame and transfer the contract’s remaining balance to address. Post-Cancun, the contract storage is not deleted (selfdestruct is effectively deprecated; the opcode still exists for legacy compatibility).
Syntax
selfdestruct($address[: <type>])
Example
selfdestruct(v0: i160)
Operands
| Name | Type | Notes |
|---|---|---|
address | i256 | Recipient of the contract’s balance; forward analysis widens to at least i160. |
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
None.
panic_revert
(Statement::PanicRevert)
Description
Outlined Solidity panic revert. Equivalent to writing the Panic(uint256) ABI encoding (selector 0x4e487b71 plus the panic code) into emulated EVM linear memory and reverting, but emitted as a single statement that lowers to one outlined helper call. Common panic codes: 0x01 assertion failure, 0x11 arithmetic overflow, 0x12 division by zero, 0x32 array-out-of-bounds, 0x41 memory overflow.
Syntax
panic_revert(0x<hex>)
Example
panic_revert(0x11) // arithmetic overflow
Operands
None — the panic code is stored as a u8 field on the IR, not an SSA operand.
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
| Source field | Printed as |
|---|---|
code: u8 | The panic code in 0x<hex> form (two hex digits, zero-padded). |
error_string_revert
(Statement::ErrorStringRevert)
Description
Outlined Solidity Error(string) revert. Equivalent to writing the Error selector (0x08c379a0), the string offset and length, and up to four 32-byte data words into emulated EVM linear memory and reverting. The string length and the data words are stored as compile-time fields; no SSA operands.
Syntax
error_string_revert(<length>, <N>_words)
Example
error_string_revert(12, 1_words) // 12-byte string in one 32-byte word
Operands
None — the string length and data are compile-time fields, not SSA operands.
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
| Source field | Printed as |
|---|---|
length: u8 | The string length in bytes, in the first syntax position. |
data: Vec<BigUint> | The number of 32-byte data words (1–4), printed as <N>_words in the second syntax position. The actual data is stored separately and not shown in the printed form. |
custom_error_revert
(Statement::CustomErrorRevert)
Description
Outlined Solidity custom-error revert. Encodes the error selector (left-shifted by 224 bits) and zero or more argument values into scratch memory and reverts. No FMP load is needed; the encoding uses the scratch region at offset 0.
Syntax
custom_error_revert(0x<hex>, [$arg_0, $arg_1, …])
Example
custom_error_revert(0xa28c4c11, [v0, v1])
Operands
| Name | Type | Notes |
|---|---|---|
arguments | Vec<Value> | Zero or more argument values; the selector is a compile-time field. |
Result and purity
| Result | Purity |
|---|---|
| None — terminates the call frame | Effectful (terminator) |
Annotations
| Source field | Printed as |
|---|---|
selector: BigUint | The 4-byte error selector in hex, in the first syntax position. The selector is stored left-shifted by 224 bits; the printer right-shifts it back and prints the bare 4-byte value. |
PVM and the pallet-revive runtime target
The revive compiler targets PolkaVM (PVM) via pallet-revive on Polkadot.
Target CPU configuration
The exact target CPU configuration can be found here.
Note
The PVM linker requires fully relocatable ELF objects.
Why PVM
PVM is a RISC-V based VM designed to overcome the flaws of WebAssebmly (Wasm). Wasm was believed to be a more efficient successor to the rather slow EVM. However, Wasm is far from an ideal target for smart contracts as some of its design decisions are unfavorable for short-lived workloads. The main problem is on-chain Wasm bytecode compilation or interpretation overhead. Prior benchmarks consistently ignoring this overhead seeded the blockchain industry with flawed assumptions: Only when ignoring the startup overhead Wasm is much faster than the slow computing EVM. In practice however, gains are nullified entirely and Wasm loses completely even against very slow VMs like the EVM. Executing Wasm contracts is in fact so inefficient that typical contract workloads are orders of magnitude more expensive than the equivalent EVM variant.
On the other hand, since RISC-V is similar to CPUs found in validator hardware (x86 and ARM), bytecode translation mostly boils down to a linear mapping from one instruction to another. The embedded ISA specification reduces the number of general purpose registers, in turn removing the need for expensive register allocation. This guarantees single-pass O(n) JIT compilation of contract bytecode. The close proximity of PVM bytecode with actual validator CPU bytecode effectively allows to move all expensive compilation workload off-chain. Benchmarks (1, 2) show that with the PVM JIT, sandboxed PVM code executes at around half the speed of native code, which falls into the same ballpark of the state-of-the-art wasmtime Wasm implementation (while EVM sits somewhere around 1/10 to less than 1/100 of native speed). However, the PVM JIT compiler only uses a fraction of the time wasmtime requires to compile the code.
Note
The PVM JIT isn’t available yet in
pallet-revive. At the time of writing, the contract code is interpreted, which is orders of magnitude slower than the JIT.
Host environment: pallet-revive
The revive compiler targets the pallet-revive runtime environment.
pallet-revive exposes a syscall like interface for contract interactions with the host environment. This is provided by the revive-runtime-api library.
After the initial launch on the Polkadot Asset Hub blockchain, the runtime API is considered stable and backwards compatible indefinitively.
Testing strategy
Contributors are encouraged to implement some appropriate unit and integration tests together with any bug fixes or new feature implementations. However, when it comes to testing the code generation logic, our testing strategy goes way beyond simple unit and integration tests. This chapter explains how the revive compiler implementation is tested for correctness and how we define correctness.
Tip
Running the integration tests require the
evmtool fromgo-ethereumin your$PATH.Either install it using your package manager or to build it from source:
git clone https://github.com/ethereum/go-ethereum/ cd go-ethereum make all export PATH=/path/to/go-ethereum/build/bin/:$PATH
Bug compatibility with Ethereum Solidity
As a Solidity compiler, we aim to preserve contract code semantics as close as possible to Solidity compiled to EVM with the solc reference implementation. As highlighted in the user guide, due to the underlying target difference, this isn’t always possible. However, wherever it is possible, we follow the philosophy of bug compatibility with the Ethereum contracts stack.
Differential integration tests
A high level of bug compatibility with Ethereum is ensured through differential testing with the Ethereum solc and EVM contracts stack. The revive-integration library is the central integration test utility, providing a set of Solidity integration test cases. Further, it implements differential tests against the reference implementation by combining the revive-runner sandbox, the go-ethereum EVM tool and the revive-differential.
The revive-runner library provides a declarative test specification format. This vastly simplifies writing differential test cases and removes a lot of room for errors in test logic. Example:
{
"differential": true,
"actions": [
{
"Instantiate": {
"code": {
"Solidity": {
"contract": "Bitwise"
}
}
}
},
{
"Call": {
"dest": {
"Instantiated": 0
},
"data": "3fa4f245"
}
}
]
}
Above example instantiates the Bitwise contract and calls it with some defined calldata. The revive-runner library implements a helper wrapper to execute test specs on the go-ethereum standalone evm tool. This allows the revive-runner to execute specs against the EVM and the pallet-revive runtime. Key to differential testing is setting "differential": true, resulting in the following:
- The
Bitwisecontract is compiled to EVM and PVM code. - The runner executes the defined
actionson the EVM and collects all state changes (storage, balance) and execution results. - The runner executes each action on the PVM. Observed state changes after each step as well as the final execution result is asserted to match the EVM counterparts exactly.
Note how we never defined any expected outcome manually. Instead, we simply observe and collect the data defining the “correct” outcome.
Differential testing in combination with declarative test specifications proved to be simple, yet very effective, in ensuring expected Ethereum Solidity semantics on pallet-revive.
The differential testing utility
A lot of nuanced bugs caused by tiny implementation details inside the revive compiler and the pallet-revive runtime could be identified and eliminated early on thanks to the differential testing strategy. Thus, we decided to take this approach further and created a comprehensive test runner and a large suite of more complex test cases.
The Revive Differential Tests follow the exact same strategy but implement a much more powerful test spec format, spec runner and reports. This allows differentially testing much more complex test cases (for example testing Uniswap pair creations and swaps), executed via transactions sent to actual blockchain nodes.
Cross compilation
We cross-compile the resolc.js frontend executable to Wasm for running it in a Node.js or browser environment.
The musl target is used to obtain statically linked ELF binaries for Linux.
Wasm via emscripten
The REVIVE_LLVM_TARGET_PREFIX environment variable is used to control the target environment LLVM dependency. This requires a compatible LLVM build, obtainable via the revive-llvm build script. Example:
# Build the host LLVM dependency with PolkaVM target support
make install-llvm
export LLVM_SYS_221_PREFIX=${PWD}/target-llvm/gnu/target-final
# Build the target LLVM dependency with PolkaVM target support
revive-llvm emsdk
source emsdk/emsdk_env.sh
revive-llvm --target-env emscripten build --llvm-projects lld
export REVIVE_LLVM_TARGET_PREFIX=${PWD}/target-llvm/emscripten/target-final
# Build the resolc frontend executable
make install-wasm
make test-wasm
musl libc
rust-musl-cross is a straightforward way to cross compile Rust to musl. The Dockerfile is an executable example of how to do that.
FAQ
What EVM version do you support?
We neither do nor don’t support any EVM version. We support Solidity versions, starting from solc version 0.8.0 onwards.
Is inline assembly supported
Yes, almost all inline assembly features are supported (see the differences in Yul translation chapter).
Do you support opcode XY?
See above, the same applies.
In what Solidity version should I write my dApp?
We generally recommend to always use the latest supported version to profit from latest bugfixes, features and performance improvements.
Find out about the latest supported version by running resolc --supported-solc-versions or checking here.
Tool XY says the contract size is larger than 24kb and will fail to deploy?
The 24kb code size restriction only exist for the EVM. Our limit is currently around 1mb and may increase further in the future.
Is resolc a drop-in replacement for solc?
No. resolc aims to work similarly to solc, but it’s not considered a drop-in replacement.
Vision and Roadmap
The revive compiler speeds up Solidity contracts significantly. revive provides a decisive edge over other contract platforms. Notably, the compiler eliminates the need of rewriting Solidity dApps in Rust or even as single dApp parachains for scaling reasons. Retaining as high compatibility with Ethereum Solidity as possible keeps entry barriers low.
We believe in Dr. Gavin Wood’s ĐApps: What Web 3.0 Looks Like manifesto and the ecosystem of the Solidity programming language. Our motivation lies in the realization that for a true web3 revolution, significant scaling efforts, like the ones provided by the PVM and this project, are necessary to unfold.
Roadmap
The first major release, resolc v1.0.0, emits functional PVM code from given Solidity sources. It relies on solc and LLVM for optimizations. The main priority of this release was delivering a mostly feature complete and safe Solidity v0.8.0 compiler.
Focus for the second major release is on the custom optimization pipeline, which aims to significantly improve emitted code blob sizes.
The below roadmap gives a rough overview of the project’s development timeline.