Scheduler Pallet
TODO: this section is still heavily under construction. key questions about availability cores and validator assignment are still open and the flow of the section may be contradictory or inconsistent
The Scheduler module is responsible for two main tasks:
- Partitioning validators into groups and assigning groups to parachains.
- Scheduling parachains for each block
It aims to achieve these tasks with these goals in mind:
- It should be possible to know at least a block ahead-of-time, ideally more, which validators are going to be assigned to which parachains.
- Parachains that have a candidate pending availability in this fork of the chain should not be assigned.
- Validator assignments should not be gameable. Malicious cartels should not be able to manipulate the scheduler to assign themselves as desired.
- High or close to optimal throughput of parachains. Work among validator groups should be balanced.
Availability Cores
The Scheduler manages resource allocation using the concept of "Availability Cores". There will be one availability core for each lease holding parachain, and a fixed number of cores used for multiplexing on-demand parachains. Validators will be partitioned into groups, with the same number of groups as availability cores. Validator groups will be assigned to different availability cores over time.
An availability core can exist in either one of two states at the beginning or end of a block: free or occupied. A free availability core can have a lease holding or on-demand parachain assigned to it for the potential to have a backed candidate included. After backing, the core enters the occupied state as the backed candidate is pending availability. There is an important distinction: a core is not considered occupied until it is in charge of a block pending availability, although the implementation may treat scheduled cores the same as occupied ones for brevity. A core exits the occupied state when the candidate is no longer pending availability - either on timeout or on availability. A core starting in the occupied state can move to the free state and back to occupied all within a single block, as availability bitfields are processed before backed candidates. At the end of the block, there is a possible timeout on availability which can move the core back to the free state if occupied.
Cores are treated as an ordered list and are typically referred to by their index in that list.
Validator Groups
Validator group assignments do not need to change very quickly. The security benefits of fast rotation are redundant with the challenge mechanism in the Approval process. Because of this, we only divide validators into groups at the beginning of the session and do not shuffle membership during the session. However, we do take steps to ensure that no particular validator group has dominance over a single lease holding parachain or on-demand parachain-multiplexer for an entire session to provide better guarantees of live-ness.
Validator groups rotate across availability cores in a round-robin fashion, with rotation occurring at fixed intervals.
The i'th group will be assigned to the (i+k)%n
'th core at any point in time, where k
is the number of rotations that
have occurred in the session, and n
is the number of cores. This makes upcoming rotations within the same session
predictable.
When a rotation occurs, validator groups are still responsible for distributing availability chunks for any previous
cores that are still occupied and pending availability. In practice, rotation and availability-timeout frequencies
should be set so this will only be the core they have just been rotated from. It is possible that a validator group is
rotated onto a core which is currently occupied. In this case, the validator group will have nothing to do until the
previously-assigned group finishes their availability work and frees the core or the availability process times out.
Depending on if the core is for a lease holding parachain or on-demand parachain, a different timeout t
from the
HostConfiguration
will apply. Availability timeouts should only be triggered
in the first t-1
blocks after the beginning of a rotation.
Claims
On-demand parachains operate on a system of claims. Collators purchase claims on authoring the next block of an on-demand parachain, although the purchase mechanism is beyond the scope of the scheduler. The scheduler guarantees that they'll be given at least a certain number of attempts to author a candidate that is backed. Attempts that fail during the availability phase are not counted, since ensuring availability at that stage is the responsibility of the backing validators, not of the collator. When a claim is accepted, it is placed into a queue of claims, and each claim is assigned to a particular on-demand parachain-multiplexing core in advance. Given that the current assignments of validator groups to cores are known, and the upcoming assignments are predictable, it is possible for on-demand parachain collators to know who they should be talking to now and how they should begin establishing connections with as a fallback.
With this information, the Node-side can be aware of which on-demand parachains have a good chance of being includable within the relay-chain block and can focus any additional resources on backing candidates from those on-demand parachains. Furthermore, Node-side code is aware of which validator group will be responsible for that thread. If the necessary conditions are reached for core reassignment, those candidates can be backed within the same block as the core being freed.
On-demand claims, when scheduled onto a free core, may not result in a block pending availability. This may be due to collator error, networking timeout, or censorship by the validator group. In this case, the claims should be retried a certain number of times to give the collator a fair shot.
Storage
Utility structs:
#![allow(unused)] fn main() { // A claim on authoring the next block for a given parathread (on-demand parachain). struct ParathreadClaim(ParaId, CollatorId); // An entry tracking a parathread (on-demand parachain) claim to ensure it does not // pass the maximum number of retries. struct ParathreadEntry { claim: ParathreadClaim, retries: u32, } // A queued parathread (on-demand parachain) entry, pre-assigned to a core. struct QueuedParathread { claim: ParathreadEntry, /// offset within the set of parathreads (on-demand parachains) ranged `0..config.parathread_cores`. core_offset: u32, } struct ParathreadQueue { queue: Vec<QueuedParathread>, /// offset within the set of parathreads (on-demand parachains) ranged `0..config.parathread_cores`. next_core_offset: u32, } enum CoreOccupied { // On-demand parachain Parathread(ParathreadEntry), // claim & retries Parachain, } enum AssignmentKind { Parachain, // On-demand parachain Parathread(CollatorId, u32), } struct CoreAssignment { core: CoreIndex, para_id: ParaId, kind: AssignmentKind, group_idx: GroupIndex, } // reasons a core might be freed. enum FreedReason { Concluded, TimedOut, } }
Storage layout:
#![allow(unused)] fn main() { /// All the validator groups. One for each core. Indices are into the `ActiveValidators` storage. ValidatorGroups: Vec<Vec<ValidatorIndex>>; /// A queue of upcoming parathread (on-demand parachain) claims and which core they should be mapped onto. ParathreadQueue: ParathreadQueue; /// One entry for each availability core. Entries are `None` if the core is not currently occupied. /// The i'th parachain lease belongs to the i'th core, with the remaining cores all being /// on-demand parachain-multiplexers. AvailabilityCores: Vec<Option<CoreOccupied>>; /// An index used to ensure that only one claim on a parathread (on-demand parachain) exists in the queue or is /// currently being handled by an occupied core. ParathreadClaimIndex: Vec<ParaId>; /// The block number where the session start occurred. Used to track how many group rotations have occurred. SessionStartBlock: BlockNumber; /// Currently scheduled cores - free but up to be occupied. /// The value contained here will not be valid after the end of a block. /// Runtime APIs should be used to determine scheduled cores /// for the upcoming block. Scheduled: Vec<CoreAssignment>, // sorted ascending by CoreIndex. }
Session Change
Session changes are the only time that configuration can change, and the Configuration module's
session-change logic is handled before this module's. We also lean on the behavior of the Inclusion
module which clears all its occupied cores on session change. Thus we don't have to worry about cores
being occupied across session boundaries and it is safe to re-size the AvailabilityCores
bitfield.
Actions:
- Set
SessionStartBlock
to current block number + 1, as session changes are applied at the end of the block. - Clear all
Some
members ofAvailabilityCores
. Return all parathread claims to queue with retries un-incremented. - Set
configuration = Configuration::configuration()
(seeHostConfiguration
) - Fetch
Shared::ActiveValidators
as AV. - Determine the number of cores & validator groups as
n_cores
. This is the maximum ofparas::Parachains::<T>::get().len() + configuration.parathread_cores
n_validators / max_validators_per_core
ifconfiguration.max_validators_per_core
isSome
and non-zero.
- Resize
AvailabilityCores
to have lengthn_cores
with allNone
entries. - Compute new validator groups by shuffling using a secure randomness beacon
- Note that the total number of validators
V
in AV may not be evenly divided byn_cores
. - The groups are selected by partitioning AV. The first
V % N
groups will have(V / n_cores) + 1
members, while the remaining groups will have(V / N)
members each. - Instead of using the indices within AV, which point to the broader set, indices into AV should be used. This implies that groups should have simply ascending validator indices.
- Note that the total number of validators
- Prune the parathread (on-demand parachain) queue to remove all retries beyond
configuration.parathread_retries
.- Also prune all on-demand claims corresponding to de-registered parachains.
- all pruned claims should have their entry removed from the parathread (on-demand parachain) index.
- assign all non-pruned claims to new cores if the number of on-demand parachain cores has changed between the
new_config
andold_config
of theSessionChangeNotification
. - Assign claims in equal balance across all cores if rebalancing, and set the
next_core
of theParathreadQueue
(on-demand queue) by incrementing the relative index of the last assigned core and taking it modulo the number of on-demand cores.
Initialization
No initialization routine runs for this module.
Finalization
No finalization routine runs for this module.
Routines
add_parathread_claim(ParathreadClaim)
: Add a parathread (on-demand parachain) claim to the queue.- Fails if any on-demand claim on the same parachain is currently indexed.
- Fails if the queue length is >=
config.scheduling_lookahead * config.parathread_cores
. - The core used for the on-demand claim is the
next_core
field of theParathreadQueue
(on-demand queue) and addingparas::Parachains::<T>::get().len()
to it. next_core
is then updated by adding 1 and taking it moduloconfig.parathread_cores
.- The claim is then added to the claim index.
free_cores(Vec<(CoreIndex, FreedReason)>)
: indicate previously-occupied cores which are to be considered returned and why they are being returned.- All freed lease holding parachain cores should be assigned to their respective parachain
- All freed on-demand parachain cores whose reason for freeing was
FreedReason::Concluded
should have the claim removed from the claim index. - All freed on-demand cores whose reason for freeing was
FreedReason::TimedOut
should have the claim added to the parathread queue (on-demand queue) again without retries incremented - All freed on-demand cores should take the next on-demand parachain entry from the queue.
schedule(Vec<(CoreIndex, FreedReason)>, now: BlockNumber)
: schedule new core assignments, with a parameter indicating previously-occupied cores which are to be considered returned and why they are being returned.- Invoke
free_cores(freed_cores)
- The i'th validator group will be assigned to the
(i+k)%n
'th core at any point in time, wherek
is the number of rotations that have occurred in the session, andn
is the total number of cores. This makes upcoming rotations within the same session predictable. Rotations are based off ofnow
.
- Invoke
scheduled() -> Vec<CoreAssignment>
: Get currently scheduled core assignments.occupied(Vec<CoreIndex>)
. Note that the given cores have become occupied.- Behavior undefined if any given cores were not scheduled.
- Behavior undefined if the given cores are not sorted ascending by core index
- This clears them from
Scheduled
and marks each correspondingcore
in theAvailabilityCores
as occupied. - Since both the availability cores and the newly-occupied cores lists are sorted ascending, this method can be implemented efficiently.
group_validators(GroupIndex) -> Option<Vec<ValidatorIndex>>
: return all validators in a given group, if the group index is valid for this session.availability_timeout_predicate() -> Option<impl Fn(CoreIndex, BlockNumber) -> bool>
: returns an optional predicate that should be used for timing out occupied cores. ifNone
, no timing-out should be done. The predicate accepts the index of the core, and the block number since which it has been occupied. The predicate should be implemented based on the time since the last validator group rotation, and the respective parachain timeouts, i.e. only withinmax(config.chain_availability_period, config.thread_availability_period)
of the last rotation would this returnSome
.group_rotation_info(now: BlockNumber) -> GroupRotationInfo
: Returns a helper for determining group rotation.next_up_on_available(CoreIndex) -> Option<ScheduledCore>
: Return the next thing that will be scheduled on this core assuming it is currently occupied and the candidate occupying it became available. Returns inScheduledCore
format (todo: link to Runtime APIs page; linkcheck doesn't allow this right now). For lease holding parachains, this is always the ID of the parachain and no specified collator. For on-demand parachains, this is based on the next item in theParathreadQueue
(on-demand queue) assigned to that core, and isNone
if there isn't one.next_up_on_time_out(CoreIndex) -> Option<ScheduledCore>
: Return the next thing that will be scheduled on this core assuming it is currently occupied and the candidate occupying it timed out. Returns inScheduledCore
format (todo: link to Runtime APIs page; linkcheck doesn't allow this right now). For parachains, this is always the ID of the parachain and no specified collator. For on-demand parachains, this is based on the next item in theParathreadQueue
(on-demand queue) assigned to that core, or if there isn't one, the claim that is currently occupying the core. OtherwiseNone
.clear()
:- Free all scheduled cores and return on-demand claims to queue, with retries incremented. Skip on-demand parachains which no longer exist under paras.