Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Request: Magic AMT #340

Open
Stebalien opened this issue Dec 3, 2020 · 2 comments
Open

Request: Magic AMT #340

Stebalien opened this issue Dec 3, 2020 · 2 comments

Comments

@Stebalien
Copy link
Contributor

Stebalien commented Dec 3, 2020

I'd really love an AMT where:

  • The bitwidth wasn't fixed for all levels. E.g., start with a bitwidth of 3 and then increase by 1 bit (double the elements) per level. I'm not sure how to do this and avoid re-writing all nodes on resize.
  • The number of elements in the leaves was proportional to the size of the leaf, not the actual number of elements. This is far from easy, but it would be so nice.
  • The depth wasn't related to the maximum element. We'd need to include offsets to make this work, but that shouldn't be difficult.
  • Better support for sparse AMTs. HAMTs don't have this issue because they store KV pairs. AMTs can more efficiently pack values by not storing the keys. A solution here would be to map offsets to sequential ranges (also fixes the depth issue).

At the end of the day, I'm thinking something like (rough idea):

type AMT struct {
    ...
    node AMTNode
}

// This expands into an array indexed by "bitwidth" bits of the index.
type AMTNode struct {
    prefix     Int // I need _some_ way to say "this is a common prefix all elements share.
    bitmap   Bytes
    children [AMTElement]
}

// If an element is a shard, we lookup the key within that shard. Otherwise, we load the child node.
type AMTElement union {
    | &AMTNode link
    | AMTShard map
} representation kinded

// A shard is a sequence of values at some offset.
type AMTShard struct {
    offset Int
    values [Any]
}

The tricky part here is deciding how big these shards should be and deciding when they need to be popped out into their own nodes.

Edit: I think we're going to need an additional offset

@Stebalien
Copy link
Contributor Author

Use-cases:

  • Queue that starts at an arbitrary offset. Currently, this offset will force a minimum depth of log(offset). This is a problem for pretty much all of our epoch-based queues in filecoin. Unfortunately, we also need random insertion/deletion.
  • AMT of really small objects. We want to create an AMT that just stores deal IDs, but these are so small that the current branching factor of 8 leads to ridiculously small leaves. Our only solution at the moment is to arbitrarily make leaves larger.
  • AMTs that efficiently store ranges at random offsets. This is basically what our sectors AMT looks like as miners will allocate batches of sectors at different offsets. Unfortunately, this means we end up with really deep AMT.

@rvagg
Copy link
Member

rvagg commented Dec 7, 2020

I keep on thinking about this as a way to iterate on the current AMT filecoin-project/go-amt-ipld#17 (comment), I just haven't had space to actually try and implement it. Keeping current semantics in place but adding the offset to allow for vertical compaction might end up making the structure quite reasonable for most of these use-cases. Bringing the branching factor up closer to the HAMT which is at 32, would probably help too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants