codec: pathed link #352

mikeal · 2021-01-19T22:24:28Z

Wanted to open up a discussion about this particular idea.

We’ve had conversations for a while about how to represent a link as a (CID + Path) but haven’t agreed on anything stable yet.

One thought I had was to create a codec and simple block format for pathed links.

| multicodec | multihash | utf8(path) |

type PathedLink struct {
  link Link
  path String
} representation map

You could use the identity multicodec to inline the relevant data into a single CID and end up with a “pathed link.” Of course, the data model representation would not automatically traverse unless configured to do so but that’s ok, we need the data model to remain stable anyway. This would give us a link level indicator of how to traverse and we could instrument whatever special traversal logic we might need when and where we need it and are ready for it.

We also get a very compact representation since we’re able to shave some bytes in the block format.

The text was updated successfully, but these errors were encountered:

Stebalien · 2021-01-19T23:01:51Z

I take it that's just everything concatenated? Works for me! Also note: utf8(/) is the multicodec for "paths" (:trollface:), so every part of this is multicodec prefixed.

Also related: multiformats/multiformats#55.

mikeal · 2021-01-19T23:34:12Z

Also note: utf8(/) is the multicodec for "paths" (:trollface:), so every part of this is multicodec prefixed.

Oh that is awesome!

rvagg · 2021-01-20T02:53:26Z

For clarity, can you describe the sections of bytes that end up forming the final CID? I'm not quite clear on how you're getting to the end product. Is it just the | multicodec | multihash | utf8(path) | - which would be backward incompatible with the current CID parsers. Or would it be | pathedlink-multicodec | identity | multicodec | multihash | utf8(path) |, so a CID+path wrapped up in a raw+identity CID, which is how I'm interpreting "You could use the identity multicodec to inline the relevant data into a single CID".

mikeal · 2021-01-20T17:47:26Z

For clarity, can you describe the sections of bytes that end up forming the final CID?

Sure.

It’s also worth pointing out that the format is essentially just a CID without the proceeding 1 (CIDv1) followed by the path.

Here’s a fully inline pathed CID.

| CIDv1 | pathed-link-multicodec | identity-multicodec | identity length | link-codec-multicodec | link-hash-multicodec | link-hash-length | link-hash-bytes | utf8(path) |

I should also note that we’ll need to apply some rules to the path in order to ensure determinism (no leading or trailing slash).

Stebalien · 2021-01-20T20:14:41Z

Wait, so you're not just concatenating a CID and a path? You're suggesting a new object type, stored as an "inline/identity" CID? I mean, that works, but it seems like just extending the CID format to allow tacking on a path would be cleaner.

mikeal · 2021-01-20T21:13:12Z

The goal here is to add this functionality in a generic way to IPLD (in other words, it should work for links to/from any existing block format) without actually breaking the IPLD Data Model (which extending the feature set of links would do).

This is “just a new block format” specifically for pathed links. That means it has a representation that conforms to the existing IPLD Data Model as it is today without any changes. Since it’s implemented as a block format but is intended to be a link itself, the sane thing to do is to embed it in an identity multihash.

It may seem a little hacky but it’s only 2 extra bytes of identity multihash overhead, which you actually gain back in the block format when compared to encoding the same data in CBOR.

The important thing is that there is an identifier (multicodec) in any link that you can use to identify pathed links. This would allow any IPLD user to add pathed link support to their implementation and have it work across all codecs without changing or breaking the existing data model and it would still produce graphs that contain all the relevant linking information in just the Data Model representation.

In practice, I don’t think there’s much difference between this and “extending the CID format” other than the fact that this is reverse compatible with systems that don’t understand pathed links. If you imagine extending the format, you’d end up putting bytes somewhere that say “this is a pathed link,” which we’re effectively doing with CID’s existing codec field, we’re just then eating two bytes for the identity multihash which we might have avoided had we gone a route that wasn’t reverse compatible.

Stebalien · 2021-01-20T21:40:24Z

I guess... My concerns are:

Unless handled "specially", these links will appear to be new blocks and would have to be handled at a higher layer (e.g., ADL). I have to wonder how this would interact with pathing, selectors, etc.

In terms of not breaking things, yeah, I get that. I'm just concerned about this feature having limited use if it lives outside the core data model.

mikeal · 2021-01-20T22:01:48Z

In terms of not breaking things, yeah, I get that. I'm just concerned about this feature having limited use if it lives outside the core data model.

We sort of have to pick one of these. If it changes the core data model we break everything, including the existing codec definitions, so that ship has sailed.

That said, pretty much everything we’ve built w/ IPLD includes things beyond the data model. IPLD Schemas are the obvious example, and I’m curious to know if there’s a way that we could get pathed links into IPLD Schemas.

rvagg · 2021-01-21T00:45:04Z

We should enumerate some reasonable use-cases for this so we can figure out if this proposal would make sense for those. It seems to me that there's going to be special-casing no matter how we implement such a thing, this one has the benefit of reusing the "inline CID" pattern which I think we've agreed needs to be baked into our stack. But there's going to be additional "is this a 0x2f + identity CID?" check at various points of the stack too, which will break some abstractions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codec: pathed link #352

codec: pathed link #352

mikeal commented Jan 19, 2021

Stebalien commented Jan 19, 2021

mikeal commented Jan 19, 2021

rvagg commented Jan 20, 2021

mikeal commented Jan 20, 2021 •

edited

Stebalien commented Jan 20, 2021

mikeal commented Jan 20, 2021 •

edited

Stebalien commented Jan 20, 2021

mikeal commented Jan 20, 2021

rvagg commented Jan 21, 2021

codec: pathed link #352

codec: pathed link #352

Comments

mikeal commented Jan 19, 2021

Stebalien commented Jan 19, 2021

mikeal commented Jan 19, 2021

rvagg commented Jan 20, 2021

mikeal commented Jan 20, 2021 • edited

Stebalien commented Jan 20, 2021

mikeal commented Jan 20, 2021 • edited

Stebalien commented Jan 20, 2021

mikeal commented Jan 20, 2021

rvagg commented Jan 21, 2021

mikeal commented Jan 20, 2021 •

edited

mikeal commented Jan 20, 2021 •

edited