RFC-0166: Snowbridge Emergency Pause Pallet


Start Date	2026-05-28
Description	A permissionless, deposit-gated emergency pause for Snowbridge that halts both sides of the bridge via best-effort calls with on-chain retry, resolved by OpenGov.
Authors	Snowbridge team

Summary

At the moment, there is no way for Snowbridge to be halted immediately. The best course of action to halt the bridge should an exploit be detected, is to halt the bridge through a whitelisted caller proposal, through OpenGov. This has obvious drawbacks - even if a Snowbridge exploit is detected, there is no way to halt the bridge on-chain (off-chain relayers can be switched off but it is obviously not a fool-proof stopgap). This RFC proposes a permissionless, instant Snowbridge halt if the caller deposits a large sum of DOT, to be slashed if paused maliciously. This proposal is a reactive security measure (i.e. a exploit or vulnerability first need to be visible for this functionality to be useful). Another proposal, Snowbridge Circuit Breakers, is proposed alongside this RFC for a more proactive approach.

Motivation

Snowbridge has no near-immediate halt path today. Existing governance halt routes require a referendum and Fellowship action (hours-to-days latency). Both are too slow for an active drainage exploit and to stop activity during investigation.

Investigation into the new TX Pause pallet and Safe Mode pallet (polkadot-fellows/runtimes PR1164) revealed parts that can be reused and referenced, but they do not resolve Snowbridge's need directly. Pallet Safe Mode blocks all calls except a configured whitelist, which works backwards for us: halting only Snowbridge would mean whitelisting essentially the whole rest of the chain rather than targeting the bridge, so it acts as a chain-wide brake rather than a per-component one. Besides this, Snowbridge requires a multi-chain freeze that spans Ethereum contracts, Bridge Hub and Asset Hub. Neither of these two existing pallets support inter-chain messaging. Similarly, pallet TX Pause requires a privileged origin. Snowbridge requires a permissionless pausing mechanism, given a sizeable, slashable deposit.

Stakeholders

Polkadot OpenGov, the resolution authority that resumes the bridge and decides between genuine (refund) and malicious (slash) triggers.
Snowbridge maintainers, who implement and operate the halt path.
Snowbridge users and integrators, who experience a halt as the bridge being closed at submit time on both Ethereum and AssetHub.
The Dynamic Allocation Pool (DAP), the destination of slashed deposits on malicious triggers, consistent with where other Polkadot slashes now go.

Explanation

Goal

A permissionless DOT deposit triggers a complete Snowbridge halt, in response to possible exploit (stop new activity while investigating) and active exploits (attacker is actively draining value).

Halt and resolution authority

Authority is deliberately split: anyone can halt, only OpenGov can resolve. The halt is permissionless, gated only by a large slashable deposit. An emergency stop has to be fast and open to whoever spots an exploit, so putting it behind a privileged origin would reintroduce the latency this proposal exists to remove, and the deposit deters griefing (lost if the halt was malicious, returned if genuine).

Resolution, resuming the bridge and deciding slash versus refund, sits with OpenGov, not the halter or the Fellowship. The halter must not resolve, or a malicious caller could hold the bridge down or reopen it to suit their exploit. The Fellowship should not, since judging whether an incident is over is an operational call, not the technical stewardship it exists for. OpenGov is the natural authority and is always available, so no automatic time-based resume is needed. The asymmetry is intentional.

Implementation

The proposed implementation starts with an entry point extrinsic, halt, on Bridge Hub (in a new pallet). The extrinsic requires a DOT deposit. Once a valid deposit has been reserved, the pallet state changes to Halted and the bridge is halted in both directions. The halt is graceful: messages that were already in flight, in either direction, are held and sent once the bridge resumes rather than being lost, with one bounded exception on the P→E side described below. Once in the Halted state, follow-up calls to the halt will fail.

The halt blocks new transfers from entering the bridge:

snowbridgeSystemFrontend::set_operating_mode(Halted): Sets the frontend's export mode on Asset Hub, blocking new P→E transfers there.
Outbound governance command issued from Bridge Hub via snowbridge-pallet-system-v2: Sets the Gateway's operating mode, blocking new E→P transfers on Ethereum.
EthereumBeaconClient::set_operating_mode(Halted) - a local Bridge Hub call that stops new Ethereum header ingestion, as defense-in-depth.

Messages that were already in flight when the halt landed are not rejected. Rejecting them would leave the bridge inconsistent, for example an asset burned on one side with nothing minted on the other. Instead, in-flight transfers are held and sent on resume, as described below.

These calls are all best-effort, and failure does not prevent the other calls from being executed. The pallet attempts each, logs successes and failures, and re-attempts pending calls in later blocks via on_initialize.

The holds below reads the new halt pallet's Halted state, which the inbound and outbound message handling on Bridge Hub read directly to decide whether to hold a message or process it as normal.

For P→E transfers, in-flight messages are held using the MessageQueue. Outbound messages only get a nonce when they are committed for relay to Ethereum, so while the bridge is halted they are accepted into the queue but not committed. When the bridge resumes, they are committed with fresh nonces, which avoids any stale light client proofs. A small number of P→E messages may have been committed and relayed just before the halt landed, which cannot be held on Bridge Hub; to cover those, we should also consider a bridge operating mode check in the Ethereum submitV1 and submitV2 contracts, so they do not process on Ethereum while the bridge is halted. There is one further edge case: a transfer from another parachain, with destination Ethereum can reach Asset Hub after the frontend is halted. Its assets have already left the origin chain, and there is no way to do a clean prevent of the message being sent. The recovery action in this scenario is to reject the message from the frontend, and the assets are trapped on Asset Hub. The original user should reclaim them on Asset Hub. While is this not the best UX, we accept this case as a tradeoff, keeping in mind that in these cases keeping bridge funds safe is the first priority. It is worth mentioning that partners should be alerted when the bridge will be halted, immediately.

For E→P transfers, halting holds messages in storage until the bridge resumes. While the bridge is halted, incoming messages are still verified but, rather than being forwarded to Asset Hub, they are kept in the pallet's storage. When the bridge resumes, the held messages are sent on. This applies to the V2 inbound path only, since the older V1 path is being deprecated.

In both directions, the held messages can be inspected while the bridge is halted and any malicious ones removed via a governance approved migration before it resumes.

Resuming the bridge

Resume is the symmetric inverse of the halt. Both resuming the bridge and resolving the halt deposit are done through OpenGov referenda. The whitelisted caller track should be used, where the decision still rests with OpenGov. Resume and the deposit resolution are separate extrinsics, so OpenGov can bundle them in a single referendum (e.g. resume + slash) or, where a scenario needs a longer halted state, refund the caller without resuming yet.

It was considered to have the bridge auto resume after a set duration, as a fallback if the resolution authority were unavailable. With OpenGov as that authority there is no separate body that can be unavailable, it is the base governance layer, so a stuck halt would only coincide with much larger Polkadot problems. Along with the idea to not add unnecessary complexity, auto-resume was therefore removed from this spec.

The resume extrinsic should do the inverse of all the operations expressed in the previous section, and set the pallet state to Normal. While the async calls execute, the bridge might actually be in Halted still, but since this is short in duration (1-2 mins) the temporary inconsistency is allowable.

Releasing or slashing the deposit

The pallet should add two extrinsics to resolve the halting deposit, slash and refund, both voted on by OpenGov. Slashing the deposit should send it to the Dynamic Allocation Pool (DAP), where other Polkadot slashes now go, routed from Bridge Hub via the dap-satellite pallet. Refunding the deposit should release the funds back to the caller. The reason behind the slash or refund should be captured on-chain as text.

Threat model coverage

New E→P entry, Gateway halt (call 2) stops new transfers starting on Ethereum.
E→P in-flight + inbound-queue exploit, the inbound hold. Covers exploits that bypass the Gateway entirely (malformed proofs, payload-decode bugs, MMR weaknesses): messages are verified and held, and malicious ones can be dropped before reaching Asset Hub.
New P→E entry, AH frontend halt (call 1) stops new transfers at Asset Hub.
P→E in-flight, the outbound hold keeps messages queued and uncommitted until resume.
Beacon-client exploit, beacon client halt (call 3).

Drawbacks

Griefing: This proposal adds permissionless halting, guarded by a slashable deposit. Someone who is willing to lose funds to censor the bridge, could repeatedly call the permissionless halt. In practice, this seems unlikely. Should this happen, the deposit amount can be upped as a further deterrent.
Best Effort Halt: Since the halt relies on async calls to multiple chains, there is the possibility that some of the halt calls might fail.

Testing, Security, and Privacy

Pallet unit tests: Usual tests to cover the pallet code, in a unit test fashion (including halting, holding and replaying inbound messages, deposit under sufficient/insufficient balance, bridge resuming, extend).
Integration tests: Test that calls the pallet extrinsic and verifies all the expected effects occur (all the Bridge Hub halt events trigger, outbound message to Ethereum is queued and AssetHub receives and processed Snowbridge system frontend halt message).
End-to-end simulation (chopsticks fork): Polkadot ecosystem tests to verify that all the correct behaviour executes against a fork of Polkadot mainnet.
Security posture: the pallet creates a new attack surface. This is the intended design, calibrated against the asymmetric harm of being unable to halt during an active drainage.

Performance, Ergonomics, and Compatibility

Performance

Performance is not really a concern of this RFC, since the halt is gated by a large deposit and is unlikely to ever receive high traffic. That said, Bridge Hub local operations are O(1) storage writes. The outbound calls to Ethereum and AssetHub are well-defined and there is no performance concern with them.

Ergonomics

The permissionless halt trigger is an extrinsic with large (to be determined, around 100k) DOT in the signer's account. Offchain relayers should implement watching events for the new pallet, and also stop relaying messages once the pallet Halted state is discovered.

The second user of this new function is OpenGov, which resolves the halt: resume, slash, refund or extend.

Compatibility

This proposal mostly adds new functionality. The main changes to existing Snowbridge components are on Bridge Hub: while the bridge is paused, outbound P→E messages are held in the queue rather than committed, and inbound E→P messages are held in storage rather than dispatched to Asset Hub. As mentioned above, the Ethereum submitV1 and submitV2 contracts should also check the operating mode, so P→E messages that were already relayed cannot process on Ethereum while the bridge is halted. The new storage defaults to Normal so the change is backwards-compatible. No other existing Snowbridge pallet interfaces change.

Prior Art and References

polkadot-fellows/runtimes #1089, the chain-wide safe-mode and tx-pause deployment proposal.
polkadot-fellows/runtimes #1164, the AssetHub safe-mode wiring.
pallet-safe-mode and pallet-tx-pause in the Polkadot SDK.
Snowbridge Circuit Breakers RFC (PR #167), the companion preventive layer.

Unresolved Questions

These all relate to pallet config, and decisions can be kicked down the line to Polkadot runtime config, if necessary:

Retry backoff: Need to agree on a retry setting config, perhaps 30-60 seconds, in block time.
Deposit: 100k DOT matches the runtimes #1089 number, but Snowbridge halts more than a generic safe-mode would. Worth a separate discussion on whether the deposit should be higher.

Per-extrinsic granular pause as a v2 of the pallet, using pallet-tx-pause's FullNameOf<T> addressing.
Watchdog automation: off-chain monitors with funded accounts that auto-trigger on observed anomalies.
Companion RFC: the Snowbridge Circuit Breakers RFC (PR #167) specifies the preventive layer (per-asset velocity caps on the Ethereum Gateway for P→E and Asset Hub for E→P) that bounds value-at-risk during the detection-latency window this pallet does not cover.

Polkadot Fellowship RFCs