Statement Distribution (Legacy)
This describes the legacy, backwards-compatible version of the Statement Distribution subsystem.
Note: All the V1 (legacy) code was extracted out to a legacy_v1
module of the statement-distribution
crate,
which doesn't alter any logic. V2 (new protocol) peers also run legacy_v1
and communicate with V1 peers using V1
messages and with V2 peers using V2 messages. Once the runtime upgrade goes through on all networks, this legacy_v1
code will no longer be triggered and will be vestigial and can be removed.
Overview
The Statement Distribution Subsystem is responsible for distributing statements about seconded candidates between validators.
Protocol
PeerSet
: Validation
Input:
NetworkBridgeUpdate(update)
StatementDistributionMessage
Output:
NetworkBridge::SendMessage(PeerId, message)
NetworkBridge::SendRequests(StatementFetchingV1)
NetworkBridge::ReportPeer(PeerId, cost_or_benefit)
Functionality
Implemented as a gossip protocol. Handles updates to our view and peers' views. Neighbor packets are used to inform peers which chain heads we are interested in data for.
The Statement Distribution Subsystem is responsible for distributing signed statements that we have generated and for
forwarding statements generated by other validators. It also detects a variety of Validator misbehaviors for reporting
to the Provisioner Subsystem. During the Backing stage of the inclusion pipeline, Statement
Distribution is the main point of contact with peer nodes. On receiving a signed statement from a peer in the same
backing group, assuming the peer receipt state machine is in an appropriate state, it sends the Candidate Receipt to the
Candidate Backing subsystem to handle the validator's statement. On receiving
StatementDistributionMessage::Share
we make sure to send messages to our backing group in addition to random other
peers, to ensure a fast backing process and getting all statements quickly for distribution.
This subsystem tracks equivocating validators and stops accepting information from them. It establishes a data-dependency order:
- In order to receive a
Seconded
message we have the corresponding chain head in our view - In order to receive a
Valid
message we must have received the correspondingSeconded
message.
And respect this data-dependency order from our peers by respecting their views. This subsystem is responsible for checking message signatures.
The Statement Distribution subsystem sends statements to peer nodes.
Peer Receipt State Machine
There is a very simple state machine which governs which messages we are willing to receive from peers. Not depicted in
the state machine: on initial receipt of any SignedFullStatement
,
validate that the provided signature does in fact sign the included data. Note that each individual parablock candidate
gets its own instance of this state machine; it is perfectly legal to receive a Valid(X)
before a Seconded(Y)
, as
long as a Seconded(X)
has been received.
A: Initial State. Receive SignedFullStatement(Statement::Second)
: extract Statement
, forward to Candidate Backing,
proceed to B. Receive any other SignedFullStatement
variant: drop it.
B: Receive any SignedFullStatement
: check signature and determine whether the statement is new to us. if new, forward
to Candidate Backing and circulate to other peers. Receive OverseerMessage::StopWork
: proceed to C.
C: Receive any message for this block: drop it.
For large statements (see below), we also keep track of the total received large statements per peer and have a hard limit on that number for flood protection. This is necessary as in the current code we only forward statements once we have all the data, therefore flood protection for large statement is a bit more subtle. This will become an obsolete problem once off chain code upgrades are implemented.
Peer Knowledge Tracking
The peer receipt state machine implies that for parsimony of network resources, we should model the knowledge of our
peers, and help them out. For example, let's consider a case with peers A, B, and C, validators X and Y, and candidate
M. A sends us a Statement::Second(M)
signed by X. We've double-checked it, and it's valid. While we're checking it, we
receive a copy of X's Statement::Second(M)
from B
, along with a Statement::Valid(M)
signed by Y.
Our response to A is just the Statement::Valid(M)
signed by Y. However, we haven't heard anything about this from C.
Therefore, we send it everything we have: first a copy of X's Statement::Second
, then Y's Statement::Valid
.
This system implies a certain level of duplication of messages--we received X's Statement::Second
from both our peers,
and C may experience the same--but it minimizes the degree to which messages are simply dropped.
And respect this data-dependency order from our peers. This subsystem is responsible for checking message signatures.
No jobs. We follow view changes from the NetworkBridge
, which in turn is updated by
the overseer.
Equivocations and Flood Protection
An equivocation is a double-vote by a validator. The Candidate Backing Subsystem is better-suited than this one to detect equivocations as it adds votes to quorum trackers.
At this level, we are primarily concerned about flood-protection, and to some extent, detecting equivocations is a part
of that. In particular, we are interested in detecting equivocations of Seconded
statements. Since every other
statement is dependent on Seconded
statements, ensuring that we only ever hold a bounded number of Seconded
statements is sufficient for flood-protection.
The simple approach is to say that we only receive up to two Seconded
statements per validator per chain head.
However, the marginal cost of equivocation, conditional on having already equivocated, is close to 0, since a single
double-vote offence is counted as all double-vote offences for a particular chain-head. Even if it were not, there is
some amount of equivocations that can be done such that the marginal cost of issuing further equivocations is close to
0, as there would be an amount of equivocations necessary to be completely and totally obliterated by the slashing
algorithm. We fear the validator with nothing left to lose.
With that in mind, this simple approach has a caveat worth digging deeper into.
First: We may be aware of two equivocated Seconded
statements issued by a validator. A totally honest peer of ours can
also be aware of one or two different Seconded
statements issued by the same validator. And yet another peer may be
aware of one or two more Seconded
statements. And so on. This interacts badly with pre-emptive sending logic. Upon
sending a Seconded
statement to a peer, we will want to pre-emptively follow up with all statements relative to that
candidate. Waiting for acknowledgment introduces latency at every hop, so that is best avoided. What can happen is that
upon receipt of the Seconded
statement, the peer will discard it as it falls beyond the bound of 2 that it is allowed
to store. It cannot store anything in memory about discarded candidates as that would introduce a DoS vector. Then, the
peer would receive from us all of the statements pertaining to that candidate, which, from its perspective, would be
undesired - they are data-dependent on the Seconded
statement we sent them, but they have erased all record of that
from their memory. Upon receiving a potential flood of undesired statements, this 100% honest peer may choose to
disconnect from us. In this way, an adversary may be able to partition the network with careful distribution of
equivocated Seconded
statements.
The fix is to track, per-peer, the hashes of up to 4 candidates per validator (per relay-parent) that the peer is aware
of. It is 4 because we may send them 2 and they may send us 2 different ones. We track the data that they are aware of
as the union of things we have sent them and things they have sent us. If we receive a 1st or 2nd Seconded
statement
from a peer, we note it in the peer's known candidates even if we do disregard the data locally. And then, upon receipt
of any data dependent on that statement, we do not reduce that peer's standing in our eyes, as the data was not
undesired.
There is another caveat to the fix: we don't want to allow the peer to flood us because it has set things up in a way
that it knows we will drop all of its traffic. We also track how many statements we have received per peer, per
candidate, and per chain-head. This is any statement concerning a particular candidate: Seconded
, Valid
, or
Invalid
. If we ever receive a statement from a peer which would push any of these counters beyond twice the amount of
validators at the chain-head, we begin to lower the peer's standing and eventually disconnect. This bound is a massive
overestimate and could be reduced to twice the number of validators in the corresponding validator group. It is worth
noting that the goal at the time of writing is to ensure any finite bound on the amount of stored data, as any
equivocation results in a large slash.
Large statements
Seconded statements can become quite large on parachain runtime upgrades for example. For this reason, there exists a
LargeStatement
constructor for the StatementDistributionMessage
wire message, which only contains light metadata of
a statement. The actual candidate data is not included. This message type is used whenever a message is deemed large.
The receiver of such a message needs to request the actual payload via request/response by means of a
StatementFetchingV1
request.
This is necessary as distribution of a large payload (mega bytes) via gossip would make the network collapse and timely distribution of statements would no longer be possible. By using request/response it is ensured that each peer only transfers large data once. We only take good care to detect an overloaded peer early and immediately move on to a different peer for fetching the data. This mechanism should result in a good load distribution and therefore a rather optimal distribution path.
With these optimizations, distribution of payloads in the size of up to 3 to 4 MB should work with Kusama validator specifications. For scaling up even more, runtime upgrades and message passing should be done off chain at some point.
Flood protection considerations: For making DoS attacks slightly harder on this subsystem, nodes will only respond to large statement requests, when they previously notified that peer via gossip about that statement. So, it is not possible to DoS nodes at scale, by requesting candidate data over and over again.