Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Self describing objects in the data model layer #312

Open
tysonzero opened this issue Oct 13, 2020 · 2 comments
Open

Self describing objects in the data model layer #312

tysonzero opened this issue Oct 13, 2020 · 2 comments

Comments

@tysonzero
Copy link

tysonzero commented Oct 13, 2020

I have been playing around with building a programming language on top of the IPLD data model.

It seems ideal to allow lists, integers and floats in this language to map directly to the equivalent data model kinds.

However I also need to encode things like lambdas and case statements and so on, and these objects can more or less appear anywhere within other objects like lists.

Currently this means I need to choose one data model kind to not use directly, and instead treat as a tag/wrapper. For example I could store all lists in my language as ["list", [1, 2, 3]] which frees up ["lambda", ...] and similar.

This is by no means a deal breaker, and I can provide things like a literal function that convert external IPLD data model objects into a literal in the language, but it seems worthy of discussion.

The way cbor itself addresses things like this is via tags (in which IPLD has registered 42 for cids), so it seems like a similar feature may be appropriate for the data model.

It seems as though we can take advantage of IPLD/IPFS here and use something like a cid for the tag, as the extreme compactness of multicodec seems overkill. This would avoid having to centrally ration them out or worry about false collisions.

This may also be relevant to discussions about mime types. As I personally agree with the stance that multicodec is an inappropriate place to put a mime type, particularly since it seems like you would need something like both dag-pb-jpg and raw-jpg to differentiate the full file node from the broken up chunks. However you could definitely store information like that in such a tag.

This also seems like it could be relevant to schemas, as objects could be directly tagged with the schema they are supposed to follow, by storing the schema as an IPLD object and pointing to that cid.

@rvagg
Copy link
Member

rvagg commented Oct 13, 2020

This is probably going to get me in trouble, but CIDs with identity multihash are a bit of an escape hatch that could be treated something like a tag. In addition, we also recently added a "reserved range" (mainly for experimentation purposes) in the multicodec table that could be combined with identity multihash to do ... creative things.

Pointing to a data model form of a schema with a CID that's embedded in the block itself is something we've discussed a fair bit but we've never got to the point of pulling the trigger on that - I think partly because we just haven't got enough practical use of schemas yet to test the viability (and sensibility) of this.

Perhaps there's a way to combine ideas here, identity multihash, using CIDs as pointers to other things within (or outside) a doc. Certainly something you'd want to do a lot more experimenting with before we baked anything official into IPLD, you're dabbling in mad science here after all!

@vmx is also heads-down in WASM land too, chasing a vision that @mikeal has been primarily pushing to be able to embed WASM code into IPLD blocks. The first use-case is to get codecs into WASM so you don't need native codecs to interpret data. But beyond that there's a lot of scope for being able to do things like traverse complex data structures using an algorithm that itself is in IPLD and can be fetched by a CID. Perhaps this might be an interesting area for you to explore too?

@tysonzero
Copy link
Author

Thanks for the info!

In practice you can probably get away with just using {type = x, value = y} dicts in most cases and not run into many problems, but it seems like in some situations you may end up with a lot of {type = <string>, value = "<y>"} instead of just "<y>".

I will keep playing around with the IPLD programming language stuff and see what seems to come up in practice.

Pointing to a data model form of a schema with a CID that's embedded in the block itself is something we've discussed a fair bit but we've never got to the point of pulling the trigger on that

Personally I'm on the fence about such a thing. As it'd require hardcoding the choice of schema language, which means it'd necessarily be too coarse for certain data types.

If the tag is just an arbitrary IPLD structure then you avoid the above problem, allowing people to opt in to a schema language of their choice. They could of course still use the schema language provided in this repo.

You could get some of the benefits of a standardized schema language whilst avoiding the above problem by making sure the schema language is extensible. For sufficiently complex data types an overly coarse IPLD schema can be given, and a custom precise schema can be added within it.

This still ends with the IPLD data model spec being quite a bit more complicated, as you effectively have to embed the entire IPLD schema spec within it, rather than it just lying on top as an independent spec.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants