Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Spec: Draft for top-level p2p spec #9732

Draft
wants to merge 425 commits into
base: main
Choose a base branch
from
Draft

Conversation

josef-widder
Copy link
Contributor

This is the beginning of a high-level p2p spec addressing #9573 for discussions. I started top down trying to understand what p2p is supposed to do rather than how p2p should do things. This collects findings from several discussion.


PR checklist

  • Tests written/updated, or no tests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

crypto-facs and others added 30 commits June 24, 2021 15:01
* add parameters to byzantine send action

* make net not trusted

it's not necessary since for proofs Ivy will assume that the environment
does not break action preconditions

* use require instead of assume

it seems that assume is not checked when other isolates call!

* add comment

* add comment

* run with random seed

* make domain model extractable to C++

* substitute require for assume

assumes in an action are not checked when the action is called! I.e.
they place no requirement on the caller; we're just assuming that the
caller is going to do the right thing. This wasn't very important here
but it leade to a minor inconsistency slipping through.

* make the net isolate not trusted

there was no need for it

* add tendermint_test.ivy

contains a simple test scenario that show that the specification is no
vacuuous

* update comment

* add comments

* throw if trying to parse nset value in the repl

* add comment

* minor refactoring
Bumps [gaurav-nelson/github-action-markdown-link-check](https://github.com/gaurav-nelson/github-action-markdown-link-check) from 1.0.12 to 1.0.13.
- [Release notes](https://github.com/gaurav-nelson/github-action-markdown-link-check/releases)
- [Commits](gaurav-nelson/github-action-markdown-link-check@1.0.12...1.0.13)

---
updated-dependencies:
- dependency-name: gaurav-nelson/github-action-markdown-link-check
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/stale](https://github.com/actions/stale) from 3.0.19 to 4.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v3.0.19...v4)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* abci: clarify what abci stands for

* link to abci type protos.
* abci: clarify connection use in-process

* Update abci.md

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* invert abci explanations

* lint++

* lint++

* lint++

* lint++

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* abci: points of clarification ahead of v0.1.0

* lint++

* typo

* lint++

* double word score

* grammar

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* Update spec/abci/abci.md

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* pr feedback

* wip

* update non-zero status code docs

* fix event description

* update CheckTx description

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* Update supervisor_001_draft.md

If the only node in the *FullNodes* set is the primary, that was just deemed faulty, we can't find honest primary.

* Update supervisor_001_draft.md
* wip

* wip

* wip

* remove comments in favor of gh comments

* wip

* udpates to language, should must etc

* Apply suggestions from code review

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>

* remove tendermint cache description

Co-authored-by: M. J. Fromberger <fromberger@interchain.io>
* add missed proto files

* add abci changes

* rename blockchain to blocksync

* Update proto/tendermint/abci/types.proto

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

Co-authored-by: Callum Waters <cmwaters19@gmail.com>
* [Rebased to v0.34.x] abci: PrepareProposal (#6544)

* fixed cherry-pick

* proto changes

* make proto-gen

* UT fixes

* generate Client directive

* mockery

* App fixes

* Disable 'modified tx' hack

* mockery

* Make format

* Fix lint

Co-authored-by: Marko <marbar3778@yahoo.com>
This test would fail if run with "go test -count=2" because it uses a
fixed address and was not closing the server, so the subsequent run
could not bind to the address.

While closing the server is correct, it would probably be better if the
API was able to report the bound address so that we could pass
"localhost:0" for an anonymous port. But I am currently focusing on test
cleanup, not ready to change any existing APIs.
* libs/pubsub/query: specify peg version in go.mod

The code to generate the pubsub queries was dependent on an unspecified
version of the peg tool. This brings peg into go.mod so it is on a fixed
version. This should also enable dependabot to notify us of future
updates to peg.

The version of query.peg.go generated from the current version of peg
correctly contains the special "Code generated by..." line to indicate
to other tools that the file is automatically generated and should
therefore be excluded from linters, etc.

I removed the make target as there were no git grep results referencing
"gen_query_parser"; directly running "go generate" is a reasonable
expectation in Go projects.

Now that "go run" is module aware, I would typically use "go run" inside
the go:generate directive, but in this case we go build to a gitignore-d
directory in order to work around the nondeterministic output detailed
in pointlander/peg#129.

* libs/pubsub/query: check error from (*QueryParser).Init()

The newly generated peg code returns an error from Init(); the previous
version was niladic.

Co-authored-by: Sam Kleinman <garen@tychoish.com>
@josef-widder josef-widder self-assigned this Nov 21, 2022
@josef-widder josef-widder changed the title Josef/i9573 josef p2p Draft for top-level p2p spec Nov 21, 2022
@josef-widder josef-widder changed the title Draft for top-level p2p spec spec: Draft for top-level p2p spec Nov 21, 2022
@josef-widder josef-widder changed the title spec: Draft for top-level p2p spec WIP Spec: Draft for top-level p2p spec Nov 21, 2022
Copy link
Contributor

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the introduction of the document. Possibly need of better definitions for used terms.

spec/p2p/p2p-top-level.md Outdated Show resolved Hide resolved
spec/p2p/p2p-top-level.md Outdated Show resolved Hide resolved
spec/p2p/p2p-top-level.md Outdated Show resolved Hide resolved
- (correct) nodes that follow the protocol described here
- (potentially) adversarial nodes whose behavior deviates to harm the system
- (correct) nodes that don't follow the protocol to shield themselves but behave in a "nice way"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth touching on "connectedness" in this section? Maybe some definitions that give an idea of how the number connections that a node keeps to peers, together with the "amount of bad peers" (or probability of a peer to be bad) has an impact on the global properties we want to achieve (e.g. a message reaches all full nodes eventually)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I think there are basic implicit assumptions/requirements in the current solution, e.g., between any two correct validators there is a path consisting only of correct full nodes. (Most likely even more paths). This might be a requirement that the p2p needs to provide to the reactors, and it can only do so, if the underlying network has even stronger properties.

Copy link
Contributor

@cason cason Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, I hardly can see this property being ensured, it is probabilistically achieved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @sergio-mena, in that these properties cannot be guaranteed, so maybe we shouldn't try. I mean, the P2P should only guarantee a best effort guarantee of conectedness, and then we call out that if by some external forces other guarantees are made, such as the existence of a GST and a maximum number of byzantine nodes, then and stronger properties can be derived, like eventual delivery of messages from some correct processes to some correct processes. This way we modularize the guarantees.

TODO:
- Reactor API
- Network? How do we communicate with other nodes
- Discuss that validators run special set-up, and manage their own neighborhood (hide behind sentry nodes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Although I'd simplify to say "if you configure a node to prevent it to normally connect to peers " (validator<-->sentry) then only the normally connected node (the sentry) can expect the properties the overlay network provides

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should use something like "normally" in the definitions of the specification. At this points I think it makes sense to talk about validators to motivate the definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of defining a public network, the overlay, fully using p2p and PEX, and a private network, where each operator manually configures nodes. In fact, validators also have their IDs set as "private", so that no one learns its actual address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergio-mena says: let's drop the term "validators" from this spec.

Talk about the public network of full nodes. Explain the validators need to make sure to be connected to the public network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cason says:

  1. We assume validators are somehow connected to at least a correct public node

From this point we reason about correct public nodes

> Even if at every time *t* we satisfy Point 1, if the overlays at times *t* and *t+1* are totally different, it might be hard to implement decent communication on top of it. E.g., Consensus gossip requires a neighbor to know its neighbors *k* state so that it can send the message to *k* that help *k* to advance. If *k* is connected only one second per hour, this is not feasible.
3. Openness. It is always the case that new nodes can be added to the system
> Assuming 1. and 2. holds, this means, there must always be nodes that are willing to add connections to new peers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about self-healing? (not sure, just thinking out loud)
Something like, "the network overlay can recover from important crippling network events (e.g. temporary partitions)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I wrote the properties 1.-3. having somewhat "good periods" in mind. The question as I see it now when I look at your comment is whether there is a relation between self healing and openness. But perhaps the connection is rather in the solution side. We need to think more ;-)


> should be English and precise. will be accompanied with a TLA spec.

TODO: This seems to be a research question. Perhaps we can find some simple properties by looking at the peer-to-peer systems academic literature from several years ago?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the following quote while doing a quick research "I cannot define gossip, but I can recognize it when I see it”

Copy link
Contributor

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josef-widder, thanks for putting together this draft top-level spec.
I added a few comments (mainly with some ideas).
I like the overall structure. IMO it's an excellent starting point: this document structure can be helpful on execution (i.e., who focuses on what and when). I think we could even structure our tracking issue (#9573) according to this document structure.

spec/p2p/p2p-top-level.md Outdated Show resolved Hide resolved
spec/p2p/p2p-top-level.md Outdated Show resolved Hide resolved
# Part I - A Tendermint node

TODO:
- perhaps we should survey here what we think are the expectations of the reactors from p2p?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the reactors be exposed or have a say in the overlay? Could we have multiple ones in parallel? For example, could we use a chord like overlay for 1-to-1 communication but a fat tree rooted at the proposer for proposal broadcast? The p2p could ask the reactors for hints on how to propagate messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • no duplication
  • no corruption

the following are "low level" and not required as long as messages are delivered.

  • ability to prioritize messages.
  • ability to make messages stale, for example, right after sending MSG1, still in the outbound buffers, send a MSG2, which makes MSG1 useless and dropped.
  • at most once delivery
  • channel isolation (or some best effort)
  • to respect back pressure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we will stay with "one overlay for all". I will make this explicit in the write-up.

p2p properties (single hop, by channel)

  • no corruption
  • prioritization

seems that currently message handling by p2p is best effort

spec/p2p/p2p-top-level.md Show resolved Hide resolved
- consensus might need for liveness that neighborhood is stable
- mempool might just need that a transaction can reach each validator within a reasonable amount of time (this might be achievable if at no point in time the current graph is connected, but, e.g., something along the lines that the union of the graphs over some period in time is connected.)
- What does p2p expect from the reactors? (don't falsely report bad nodes; this puts requirements on the reactors and perhaps/likely also on the application running on top of ABCI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where exactly to put it, but it would be a good property not to leak information to other nodes, for example, the internal topology of a validator network.


> should be English and precise. will be accompanied with a TLA spec.

TODO: This seems to be a research question. Perhaps we can find some simple properties by looking at the peer-to-peer systems academic literature from several years ago?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the following quote while doing a quick research "I cannot define gossip, but I can recognize it when I see it”

- I/O
- dispatch messages incoming from the network to the reactors
- send messages incoming from the reactors to the network (the peers the messages should go to)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • do not forward messages already known to potential receivers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is currently implemented at reactor level, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reactors try to filter what to send, based on the deduced state of the other nodes, but not based one what messages they have received. The first is stronger, if correctly implemented, but the latter should be easier to implement.

spec/p2p/p2p-top-level.md Show resolved Hide resolved
assumptions make sense, e.g., it is in the interest of a full node to behave
correctly

> should have clear formalization in temporal logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one interesting property is that eventually all messages sent by correct nodes to correct nodes are delivered or they become stale (e.g., a decision is reached and becomes known)


> safety specifications / invariants in English

TODO: In a good period, *p* should stay connected with *q*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really required? What if there are better connections?
I thought about "In a good period, connections eventually stabilize", but even in good periods there may be changes in the network that the p2p layer might want to take into account. And even if not, if we need to ensure openness and a bounded number of connections, good connections may need to be dropped.

@lasarojc
Copy link
Contributor

Wondering if there is any way we could get anonymized aggregated data from the network? For example average number of connections, average memory usage, anything that we could use to study the network while not revealing too much to malicious entities.

josef-widder and others added 3 commits November 25, 2022 14:18
Co-authored-by: Lasaro Camargos <lasaro@informal.systems>
Co-authored-by: Lasaro Camargos <lasaro@informal.systems>
- proposers are not disconnected?
- mempool might just need that a transaction can reach each validator within a reasonable amount of time (this might be achievable if at no point in time the current graph is connected, but, e.g., something along the lines that the union of the graphs over some period in time is connected.)
> talk about the 1-to-1 and the 1-to-many/all delivery guarantees needed by the protocols (do this for each protocol. discuss how these requirements translate into what p2p should guarantee)
- What does p2p expect from the reactors? (don't falsely report bad nodes; this puts requirements on the reactors and perhaps/likely also on the application running on top of ABCI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • to provide reasonable parameters to drive the connections of peers.

    These parameters can be low level (like what we have now, i.e., number of outgoing connections, number of incoming connections, permanent peers) or higher level (e.g., form a tree, for a ring with long n fingers...), which require a more elaborate PEX.

@thanethomson thanethomson added the S:wip Work in progress (prevents stalebot from automatically closing) label Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:spec Component: specification S:wip Work in progress (prevents stalebot from automatically closing)
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet