Expand description

Learn about how to write safe and defensive code in your FRAME runtime. Defensive programming is a design paradigm that enables a program to continue running despite unexpected behavior, input, or events that may arise in runtime. Usually, unforeseen circumstances may cause the program to stop or, in the Rust context, panic!. Defensive practices allow for these circumstances to be accounted for ahead of time and for them to be handled gracefully, which is in line with the intended fault-tolerant and deterministic nature of blockchains.

The Polkadot SDK is built to reflect these principles and to facilitate their usage accordingly.

§General Overview

When developing within the context of the Substrate runtime, there is one golden rule:

DO NOT PANIC. There are some exceptions, but generally, this is the default precedent.

It’s important to differentiate between the runtime and node. The runtime refers to the core business logic of a Substrate-based chain, whereas the node refers to the outer client, which deals with telemetry and gossip from other nodes. For more information, read about Substrate’s node architecture. It’s also important to note that the criticality of the node is slightly lesser than that of the runtime, which is why you may see unwrap() or other “non-defensive” approaches in a few places of the node’s code repository.

Most of these practices fall within Rust’s colloquial usage of proper error propagation, handling, and arithmetic-based edge cases.

General guidelines:

  • Avoid writing functions that could explicitly panic, such as directly using unwrap() on a Result, or accessing an out-of-bounds index on a collection. Safer methods to access collection types, i.e., get() which allow defensive handling of the resulting Option are recommended to be used.
  • It may be acceptable to use except(), but only if one is completely certain (and has performed a check beforehand) that a value won’t panic upon unwrapping. Even this is discouraged, however, as future changes to that function could then cause that statement to panic. It is important to ensure all possible errors are propagated and handled effectively.
  • If a function can panic, it usually is prefaced with unchecked_ to indicate its unsafety.
  • If you are writing a function that could panic, document it!
  • Carefully handle mathematical operations. Many seemingly, simplistic operations, such as arithmetic in the runtime, could present a number of issues (see more later in this document). Use checked arithmetic wherever possible.

These guidelines could be summarized in the following example, where bad_pop is prone to panicking, and good_pop allows for proper error handling to take place:

 // Bad pop always requires that we return something, even if vector/array is empty.
 fn bad_pop<T>(v: Vec<T>) -> T {}
 // Good pop allows us to return None from the Option if need be.
 fn good_pop<T>(v: Vec<T>) -> Option<T> {}

§Defensive Traits

The Defensive trait provides a number of functions, all of which provide an alternative to ‘vanilla’ Rust functions, e.g.:

Defensive methods use debug_assertions, which panic in development, but in production/release, they will merely log an error (i.e., log::error).

The Defensive trait and its various implementations can be found here.

§Integer Overflow

The Rust compiler prevents static overflow from happening at compile time. The compiler panics in debug mode in the event of an integer overflow. In release mode, it resorts to silently wrapping the overflowed amount in a modular fashion (from the MAX back to zero).

In runtime development, we don’t always have control over what is being supplied as a parameter. For example, even this simple add function could present one of two outcomes depending on whether it is in release or debug mode:

fn naive_add(x: u8, y: u8) -> u8 {
    x + y
}

If we passed overflow-able values at runtime, this could panic (or wrap if in release).

naive_add(250u8, 10u8); // In debug mode, this would panic. In release, this would return 4.

It is the silent portion of this behavior that presents a real issue. Such behavior should be made obvious, especially in blockchain development, where unsafe arithmetic could produce unexpected consequences like a user balance over or underflowing.

Fortunately, there are ways to both represent and handle these scenarios depending on our specific use case natively built into Rust and libraries like sp_arithmetic.

§Infallible Arithmetic

Both Rust and Substrate provide safe ways to deal with numbers and alternatives to floating point arithmetic.

Known scenarios that could be fallible should be avoided: i.e., avoiding the possibility of dividing/modulo by zero at any point should be mitigated. One should be opting for a checked_* method to introduce safe arithmetic in their code in most cases.

A developer should use fixed-point instead of floating-point arithmetic to mitigate the potential for inaccuracy, rounding errors, or other unexpected behavior.

Using floating point number types (i.e. f32, f64) in the runtime should be avoided, as a single non-deterministic result could cause chaos for blockchain consensus along with the issues above. For more on the specifics of the peculiarities of floating point calculations, watch this video by the Computerphile.

The following methods demonstrate different ways to handle numbers natively in Rust safely, without fear of panic or unexpected behavior from wrapping.

§Checked Arithmetic

Checked operations utilize an Option<T> as a return type. This allows for catching any unexpected behavior in the event of an overflow through simple pattern matching.

This is an example of a valid operation:

#[test]
fn checked_add_example() {
	// This is valid, as 20 is perfectly within the bounds of u32.
	let add = (10u32).checked_add(10);
	assert_eq!(add, Some(20))
}

This is an example of an invalid operation. In this case, a simulated integer overflow, which would simply result in None:

#[test]
fn checked_add_handle_error_example() {
	// This is invalid - we are adding something to the max of u32::MAX, which would overflow.
	// Luckily, checked_add just marks this as None!
	let add = u32::MAX.checked_add(10);
	assert_eq!(add, None)
}

Suppose you aren’t sure which operation to use for runtime math. In that case, checked operations are the safest bet, presenting two predictable (and erroring) outcomes that can be handled accordingly (Some and None).

The following conventions can be seen within the Polkadot SDK, where it is handled in two ways:

§Handling via Option - More Verbose

Because wrapped operations return Option<T>, you can use a more verbose/explicit form of error handling via if or if let:

fn increase_balance(account: Address, amount: u64) -> Result<(), RuntimeError> {
	// Get a user's current balance
	let balance = Runtime::get_balance(account)?;
	// SAFELY increase the balance by some amount
	if let Some(new_balance) = balance.checked_add(amount) {
		Runtime::set_balance(account, new_balance);
		Ok(())
	} else {
		Err(RuntimeError::Overflow)
	}
}

Optionally, match may also be directly used in a more concise manner:

fn increase_balance_match(account: Address, amount: u64) -> Result<(), RuntimeError> {
	// Get a user's current balance
	let balance = Runtime::get_balance(account)?;
	// SAFELY increase the balance by some amount
	let new_balance = match balance.checked_add(amount) {
		Some(balance) => balance,
		None => {
			return Err(RuntimeError::Overflow);
		},
	};
	Runtime::set_balance(account, new_balance);
	Ok(())
}

This is generally a useful convention for handling checked types and most types that return Option<T>.

§Handling via Result - Less Verbose

In the Polkadot SDK codebase, checked operations are handled as a Result via ok_or. This is a less verbose way of expressing the above. This usage often boils down to the developer’s preference:

fn increase_balance_result(account: Address, amount: u64) -> Result<(), RuntimeError> {
	// Get a user's current balance
	let balance = Runtime::get_balance(account)?;
	// SAFELY increase the balance by some amount - this time, by using `ok_or`
	let new_balance = balance.checked_add(amount).ok_or(RuntimeError::Overflow)?;
	Runtime::set_balance(account, new_balance);
	Ok(())
}

§Saturating Operations

Saturating a number limits it to the type’s upper or lower bound, even if the integer type overflowed in runtime. For example, adding to u32::MAX would simply limit itself to u32::MAX:

#[test]
fn saturated_add_example() {
	// Saturating add simply saturates
	// to the numeric bound of that type if it overflows.
	let add = u32::MAX.saturating_add(10);
	assert_eq!(add, u32::MAX)
}

Saturating calculations can be used if one is very sure that something won’t overflow, but wants to avoid introducing the notion of any potential-panic or wrapping behavior.

There is also a series of defensive alternatives via DefensiveSaturating, which introduces the same behavior of the Defensive trait, only with saturating, mathematical operations:

#[test]
#[cfg_attr(debug_assertions, should_panic(expected = "Defensive failure has been triggered!"))]
fn saturated_defensive_example() {
	let saturated_defensive = u32::MAX.defensive_saturating_add(10);
	assert_eq!(saturated_defensive, u32::MAX);
}

§Mathematical Operations in Substrate Development - Further Context

As a recap, we covered the following concepts:

  1. Checked operations - using Option or Result
  2. Saturating operations - limited to the lower and upper bounds of a number type
  3. Wrapped operations (the default) - wrap around to above or below the bounds of a type
§The problem with ‘default’ wrapped operations

Wrapped operations cause the overflow to wrap around to either the maximum or minimum of that type. Imagine this in the context of a blockchain, where there are account balances, voting counters, nonces for transactions, and other aspects of a blockchain.

While it may seem trivial, choosing how to handle numbers is quite important. As a thought exercise, here are some scenarios of which will shed more light on when to use which.

§Bob’s Overflowed Balance

Bob’s balance exceeds the Balance type on the EduChain. Because the pallet developer did not handle the calculation to add to Bob’s balance with any regard to this overflow, Bob’s balance is now essentially 0, the operation wrapped.

Solution: Saturating or Checked For Bob's balance problems, using a `saturating_add` or `checked_add` could've mitigated this issue. They simply would've reached the upper, or lower bounds, of the particular type for an on-chain balance. In other words: Bob's balance would've stayed at the maximum of the Balance type.
§Alice’s ‘Underflowed’ Balance

Alice’s balance has reached 0 after a transfer to Bob. Suddenly, she has been slashed on EduChain, causing her balance to reach near the limit of u32::MAX - a very large amount - as wrapped operations can go both ways. Alice can now successfully vote using her new, overpowered token balance, destroying the chain’s integrity.

Solution: Saturating For Alice's balance problem, using `saturated_sub` could've mitigated this issue. A saturating calculation would've simply limited her balance to the lower bound of u32, as having a negative balance is not a concept within blockchains. In other words: Alice's balance would've stayed at "0", even after being slashed.

This is also an example that while one system may work in isolation, shared interfaces, such as the notion of balances, are often shared across multiple pallets - meaning these small changes can make a big difference depending on the scenario.

§Proposal ID Overwrite

A u8 parameter, called proposals_count, represents the type for counting the number of proposals on-chain. Every time a new proposal is added to the system, this number increases. With the proposal pallet’s high usage, it has reached u8::MAX’s limit of 255, causing proposals_count to go to 0. Unfortunately, this results in new proposals overwriting old ones, effectively erasing any notion of past proposals!

Solution: Checked For the proposal IDs, proper handling via `checked` math would've been suitable, Saturating could've been used - but it also would've 'failed' silently. Using `checked_add` to ensure that the next proposal ID would've been valid would've been a viable way to let the user know the state of their proposal:
let next_proposal_id = current_count.checked_add(1).ok_or_else(|| Error::TooManyProposals)?;

From the above, we can clearly see the problematic nature of seemingly simple operations in the runtime, and care should be given to ensure a defensive approach is taken.

§Edge cases of panic!-able instances in Substrate

As you traverse through the codebase (particularly in substrate/frame, where the majority of runtime code lives), you may notice that there (only a few!) occurrences where panic! is used explicitly. This is used when the runtime should stall, rather than keep running, as that is considered safer. Particularly when it comes to mission-critical components, such as block authoring, consensus, or other protocol-level dependencies, going through with an action may actually cause harm to the network, and thus stalling would be the better option.

Take the example of the BABE pallet (pallet_babe), which doesn’t allow for a validator to participate if it is disabled (see: frame::traits::DisabledValidators):

if T::DisabledValidators::is_disabled(authority_index) {
    panic!(
      "Validator with index {:?} is disabled and should not be attempting to author blocks.",
        authority_index,
    );
}

There are other examples in various pallets, mostly those crucial to the blockchain’s functionality. Most of the time, you will not be writing pallets which operate at this level, but these exceptions should be noted regardless.

§Other Resources