bitcoin: add bitcoin docs (WIP) #270

rvagg · 2020-06-12T08:12:25Z

Not complete, but it's big enough and very tedious, that I just want to push something. If anyone feels like reviewing as a WIP feedback would be appreciated but I've got a lot more to do to connect the pieces to IPLD. Will call for reviews when I think it's ~finished.

ribasushi

Did a first pass over this. Exciting!

ribasushi · 2020-06-12T08:27:51Z

block-layer/codecs/bitcoin.md

+
+The Bitcoin format consistently uses a double-SHA2-256 hash to produce content digests. This algorithm is simply the SHA2-256 digest of a SHA2-256 digest of the raw bytes. These digests are also used publicly when referring to individual transactions and whole block graphs. The Bitcoin Core CLI as well as the many web-based block explorers allow data look-up by these addresses.
+
+When publishing these addresses, they are typically presented as big-endian in hexadecimal. To represent these in byte form on a little-endian system, they therefore need to be reversed and the hexadecimal decoded.


Since endianness is usually defined over a multibyte integer type, I am for real not sure which type of "little endian" is meant here ( and casual googling doesn't help ). If I see the following 128bit long payload on disk:

00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff
What is the actual value:

33 22 11 00 77 66 55 44 bb aa 99 88 ff ee dd cc

77 66 55 44 33 22 11 00 ff ee dd cc bb aa 99 88

ff ee dd cc bb aa 99 88 77 66 55 44 33 22 11 00

Something else?

Alternatively - if the on-disk structures are explicitly defined over >64bit integer types: this needs to be called out early, so folks like me get in the right mindset.

3, as if you read it entirely as a 32-byte unsigned integer, you read it in the reverse than you would if you treated it as LE. "usually defined over a multibyte integer type" is what's being got at here, but it's 32-bytes, not some repeating sub-pattern.

The "as if" makes me think this is leaning too heavily on the "uint256" thing too much. I'm tempted to remove that language entirely and say it's just a byte string and by convention it gets byte-reversed and turned into hexadecimal when presented publicly.

In truth, I never touch this "uint256" thing myself in any of my code. It treat all of these things as byte arrays and then reverse+hexadecimal whenever I need to present the value. Otherwise they're only useful as byte arrays. So I guess that fact in itself suggests the backing out of this concept. It's really just window dressing to make the zeros go at the start of block addresses.

block-layer/codecs/bitcoin.md

ribasushi · 2020-06-12T08:51:35Z

block-layer/codecs/bitcoin.md

+
+### Transactions
+
+There are at least one transaction in a Bitcoin block graph. The first transaction is called the "coinbase" and represents the miner rewards. A block graph may _only_ contain a coinbase or it may also also contain a number of transactions representing the movement of coins between wallets. Each transaction contains a list of one or more "Transaction Ins" and a list of one or more "Transaction Outs" representing the flow of coins. The coinbase contains a single Transaction In containing the block reward and the Transaction Outs list represent the destination of the rewards. Non-coinbase transactions contain Transaction Ins representing the source of the coins being transacted, linking to previous transactions, and a list of Transaction Outs containing the details of the destination wallets.


There are at least one transaction in a Bitcoin block graph.

Technically, past ~2140, when everyone working on this is dead, this may no longer be true ;P

Do you mean that it's only true if people are transacting on Bitcoin and beyond ~2140 there may no longer be transactions? It's still going to be true as long as someone is mining Bitcoin because there's always a coinbase. There cannot exist a "bitcoin block graph" without at least one transaction!

I'm looking through Zcash right now and it's kind of sad how many coinbase-only transactions there are near the head. It makes it look like it now exists to be mined ...

block-layer/codecs/bitcoin.md

ribasushi · 2020-06-12T08:59:54Z

block-layer/codecs/bitcoin.md

+}
+
+type OutPoint struct {
+  hash Bytes # 256-bits


This, together with the #int64 below, almost makes one want to say "ipld schema integers are of arbitrary precision", and leave it up to the codecs when to switch the wire-representation, and leave it to codecs when to use a language internal bigint and when to use a native integer.

This has probably been discussed already, so feel free to ignore with no further discussion.

ribasushi · 2020-06-12T09:08:18Z

block-layer/codecs/bitcoin.md

+* `version`: a signed 32-bit integer
+* `segwit`: is implicit and `false` for all block graphs prior to the SegWit soft fork, which occurred at a height of 481,824. After this height, the two bytes following `version` are inspected, if they are equal to `[0x0, 0x1]`, the bytes are consumed and `segwit` is `true`. If the bytes are not exactly these values, `segwit` is false, and the two bytes instead form the begining of `vin` (the first byte of `vin` is part of the compact size integer, and as `vin` must contain one or more elements, it cannot be `0x00`, hence the reliability of the `segwit` flag maintaining backward-compatibility).
+* `vin`: one or more elements, prefixed by a compact size int, then, for each element up to the size:
+  * `hash`: an unsigned 256-bit integer / a 32-byte binary string, the OutPoint transaction ID hash identifying the source transaction for the coins


This goes together with the endianness discussion above: being an integer and a string at the same time can't be a thing.

block-layer/codecs/bitcoin.md

rvagg · 2020-06-15T04:48:02Z

Notes to self arising from discussion so far, for when I do revisions:

Probably remove the "uint256" concept, it's a confusing mess. In practice it's just a byte array that is presented to users as a byte-reversed hex string by convention
Notes about the various levels on the gradient of what it means to "decode" a bitcoin block and what you present at the data model - from strictly the elements you pull out of the binary format up to decoding all of the things including turning linkable things into CIDs and even decoding the script into its string format (or perhaps something more advanced?).
Work on something to clarify how ipldsch is being used—this applies broadly to our docs too, need better language to say "this defines a structure that could be conceived of as a data model thing, but we need more details as an adjunct to talk about specifics of binary representations", which gets tricky still because in these docs I'm even presenting different forms of data model things (see previous point)—the raw decoded pieces vs the more advanced version that can be presented to match the Bitcoin Core RPC (i.e. convention).

mikeal · 2020-09-17T20:08:13Z

what’s the status here?

i’d like to get something up on the specs website that i can link to

rvagg · 2020-09-18T02:19:02Z

status is that each time I sit down to attack this I'm overwhelmed by the size of the task to pull it together into a coherent form that covers everything that it needs to; but it does weigh on me that it's outstanding and I need to get it closed out along with js-multiformats reworks of the codec(s).

It's not in a worthy state to even merge as a draft tbh, so you're out of luck for now but I'll try and get to it asap.

warpfork · 2021-04-11T12:13:03Z

It would be really cool to merge this, even if we want to put some disclaimer texts in somewhere. This is way more and better information than we have on this topic anywhere else, as far as I know.

rvagg · 2021-04-21T07:05:33Z

Not merge-worthy IMO, it's so far from what it should have been. I think a better approach might be to start from the reverse end, like the Filecoin, and now Ethereum data specs, and work backward. It turned out to be really hard to work forward like I was doing it here.
tbh I'm not sure what to do with this, I don't see any time on the horizon for me to finish this out but it's one of those things that linger in the back of my head, along with the code that backs this work which is also out of date now with the ecosystem it sits within.

rvagg added 2 commits June 11, 2020 16:28

bitcoin: add bitcoin docs

206661a

fixup! bitcoin: add bitcoin docs

7927851

ribasushi reviewed Jun 12, 2020

View reviewed changes

This was referenced Jun 15, 2020

ipld:schema Clarify rules-of-engagement with floats in our own schemas #271

Open

feat: add dag-jose format #269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitcoin: add bitcoin docs (WIP) #270

bitcoin: add bitcoin docs (WIP) #270

rvagg commented Jun 12, 2020

ribasushi left a comment

ribasushi Jun 12, 2020 •

edited

rvagg Jun 15, 2020

rvagg Jun 15, 2020

ribasushi Jun 12, 2020

rvagg Jun 15, 2020

ribasushi Jun 12, 2020

ribasushi Jun 12, 2020

rvagg commented Jun 15, 2020

mikeal commented Sep 17, 2020

rvagg commented Sep 18, 2020

warpfork commented Apr 11, 2021

rvagg commented Apr 21, 2021


		The Bitcoin format consistently uses a double-SHA2-256 hash to produce content digests. This algorithm is simply the SHA2-256 digest of a SHA2-256 digest of the raw bytes. These digests are also used publicly when referring to individual transactions and whole block graphs. The Bitcoin Core CLI as well as the many web-based block explorers allow data look-up by these addresses.

		When publishing these addresses, they are typically presented as big-endian in hexadecimal. To represent these in byte form on a little-endian system, they therefore need to be reversed and the hexadecimal decoded.


		### Transactions

		There are at least one transaction in a Bitcoin block graph. The first transaction is called the "coinbase" and represents the miner rewards. A block graph may _only_ contain a coinbase or it may also also contain a number of transactions representing the movement of coins between wallets. Each transaction contains a list of one or more "Transaction Ins" and a list of one or more "Transaction Outs" representing the flow of coins. The coinbase contains a single Transaction In containing the block reward and the Transaction Outs list represent the destination of the rewards. Non-coinbase transactions contain Transaction Ins representing the source of the coins being transacted, linking to previous transactions, and a list of Transaction Outs containing the details of the destination wallets.

bitcoin: add bitcoin docs (WIP) #270

Are you sure you want to change the base?

bitcoin: add bitcoin docs (WIP) #270

Conversation

rvagg commented Jun 12, 2020

ribasushi left a comment

Choose a reason for hiding this comment

ribasushi Jun 12, 2020 • edited

Choose a reason for hiding this comment

rvagg Jun 15, 2020

Choose a reason for hiding this comment

rvagg Jun 15, 2020

Choose a reason for hiding this comment

ribasushi Jun 12, 2020

Choose a reason for hiding this comment

rvagg Jun 15, 2020

Choose a reason for hiding this comment

ribasushi Jun 12, 2020

Choose a reason for hiding this comment

ribasushi Jun 12, 2020

Choose a reason for hiding this comment

rvagg commented Jun 15, 2020

mikeal commented Sep 17, 2020

rvagg commented Sep 18, 2020

warpfork commented Apr 11, 2021

rvagg commented Apr 21, 2021

ribasushi Jun 12, 2020 •

edited