referrerpolicy=no-referrer-when-downgrade

Module enable_elastic_scaling

Source
Expand description

How to enable elastic scaling on a parachain.

§Enable elastic scaling for a parachain

This guide assumes full familiarity with Asynchronous Backing and its terminology, as defined in the Polkadot SDK Docs.

§Quick introduction to Elastic Scaling

Elastic scaling is a feature that enables parachains (rollups) to use multiple cores. Parachains can adjust their usage of core resources on the fly to increase TPS and decrease latency.

§When do you need Elastic Scaling?

Depending on their use case, applications might have an increased need for the following:

  • compute (CPU weight)
  • bandwidth (proof size)
  • lower latency (block time)

§High throughput (TPS) and lower latency

If the main bottleneck is the CPU, then your parachain needs to maximize the compute usage of each core while also achieving a lower latency. 3 cores provide the best balance between CPU, bandwidth and latency: up to 6s of execution, 5MB/s of DA bandwidth and fast block time of just 2 seconds.

§High bandwidth

Useful for applications that are bottlenecked by bandwidth. By using 6 cores, applications can make use of up to 6s of compute, 10MB/s of bandwidth while also achieving 1 second block times.

§Ultra low latency

When latency is the primary requirement, Elastic scaling is currently the only solution. The caveat is the efficiency of core time usage decreases as more cores are used.

For example, using 12 cores enables fast transaction confirmations with 500ms blocks and up to 20 MB/s of DA bandwidth.

§Dependencies

Prerequisites: Polkadot-SDK 2509 or newer.

To ensure the security and reliability of your chain when using this feature you need the following:

  • An omni-node based collator. This has already become the default choice for collators.
  • UMP signal support. RFC103. This is mandatory protection against PoV replay attacks.
  • Enabling the relay parent offset feature. This is required to ensure the parachain block times and transaction in-block confidence are not negatively affected by relay chain forks. Read crate::guides::handling_parachain_forks for more information.
  • Block production configuration adjustments.

§Upgrade to Polkadot Omni node

Your collators need to run polkadot-parachain or polkadot-omni-node with the --authoring slot-based CLI argument. To avoid potential issues and get best performance it is recommeneded to always run the
latest release on all of the collators.

Further information about omni-node and how to upgrade is available:

§Enable UMP signals

The only required change for the runtime is enabling the experimental-ump-signals feature of the parachain-system pallet: cumulus-pallet-parachain-system = { workspace = true, features = ["experimental-ump-signals"] }

You can find more technical details about UMP signals and their usage for elastic scaling here.

§Enable the relay parent offset feature

It is recommended to use an offset of 1, which is sufficient to eliminate any issues with relay chain forks.

Configure the relay parent offset like this:

    /// Build with an offset of 1 behind the relay chain best block.
    const RELAY_PARENT_OFFSET: u32 = 1;

    impl cumulus_pallet_parachain_system::Config for Runtime {
        // ...
        type RelayParentOffset = ConstU32<RELAY_PARENT_OFFSET>;
    }

Implement the runtime API to retrieve the offset on the client side.

    impl cumulus_primitives_core::RelayParentOffsetApi<Block> for Runtime {
        fn relay_parent_offset() -> u32 {
            RELAY_PARENT_OFFSET
        }
    }

§Block production configuration

This configuration directly controls the minimum block time and maximum number of cores the parachain can use.

Example configuration for a 3 core parachain:

   /// The upper limit of how many parachain blocks are processed by the relay chain per
   /// parent. Limits the number of blocks authored per slot. This determines the minimum
   /// block time of the parachain:
   /// `RELAY_CHAIN_SLOT_DURATION_MILLIS/BLOCK_PROCESSING_VELOCITY`
   const BLOCK_PROCESSING_VELOCITY: u32 = 3;

   /// Maximum number of blocks simultaneously accepted by the Runtime, not yet included
   /// into the relay chain.
   const UNINCLUDED_SEGMENT_CAPACITY: u32 = (2 + RELAY_PARENT_OFFSET) *
BLOCK_PROCESSING_VELOCITY + 1;

   /// Relay chain slot duration, in milliseconds.
   const RELAY_CHAIN_SLOT_DURATION_MILLIS: u32 = 6000;

   type ConsensusHook = cumulus_pallet_aura_ext::FixedVelocityConsensusHook<
       Runtime,
       RELAY_CHAIN_SLOT_DURATION_MILLIS,
       BLOCK_PROCESSING_VELOCITY,
       UNINCLUDED_SEGMENT_CAPACITY,
   >;

§Current limitations

§Maximum execution time per relay chain block.

Since parachain block authoring is sequential, the next block can only be built after the previous one has been imported. At present, a core allows up to 2 seconds of execution per relay chain block.

If we assume a 6s parachain slot, and each block takes the full 2 seconds to execute, the parachain will not be able to fully utilize the compute resources of all 3 cores.

If the collator hardware is faster, it can author and import full blocks more quickly, making it possible to utilize even more than 3 cores efficiently.

§Why?

Within a 6-second parachain slot, collators can author multiple parachain blocks. Before building the first block in a slot, the new block author must import the last block produced by the previous author. If the import of the last block is not completed before the next relay chain slot starts, the new author will build on its parent (assuming it was imported). This will create a fork which degrades the parachain block confidence and block times.

This means that, on reference hardware, a parachain with a slot time of 6s can effectively utilize up to 4 seconds of execution per relay chain block, because it needs to ensure the next block author has enough time to import the last block. Hardware with higher single-core performance can enable a parachain to fully utilize more cores.

§Fixed factor scaling.

For true elasticity, a parachain needs to acquire more cores when needed in an automated manner. This functionality is not yet available in the SDK, thus acquiring additional on-demand or bulk cores has to be managed externally.