Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tablets: consider alternative implementation #983

Open
Lorak-mmk opened this issue Apr 18, 2024 · 5 comments
Open

Tablets: consider alternative implementation #983

Lorak-mmk opened this issue Apr 18, 2024 · 5 comments
Labels
load-balancing performance Improves performance of existing features
Milestone

Comments

@Lorak-mmk
Copy link
Collaborator

Current implementation of tablets that is going to be merged stores tablets for a given table in Vec.
This is ok for searching, but insterting / deleting tablets requires moving part of the vector.
The alternative would be implementation based on some tree (maybe BTreeMap is enough, maybe we would need external crate, I'm not sure). It would make insertion / deletion logarithmic instead of linear, but could have worse cache behavior.

It's hard to guess which version will be faster in practice - we should implement both and benchmark them on some workloads.

@mykaul
Copy link
Contributor

mykaul commented Apr 18, 2024

We are aiming (at least initially) for tablet size - which isn't too small - we'd like to have 5-10GB in size.
I'm not sure if that makes the design easier. I don't think insert/delete of a tablet is something that happens that often - we won't be seeing very often split/merge.

@avikivity - thoughts?

@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Apr 18, 2024

We are aiming (at least initially) for tablet size - which isn't too small - we'd like to have 5-10GB in size. I'm not sure if that makes the design easier. I don't think insert/delete of a tablet is something that happens that often - we won't be seeing very often split/merge.

Inserts to tablets structure happen any time we receive info about the tablet. This is not only during split / merge but during normal work of a new session. After session creation the vector will be empty, and all queries will be non-token-aware. Then we'll start receiving tablets because we send queries to wrong nodes and we'll insert those tablets to the Vec.

Deletes will happen when received tablet overlaps with some tablet in the vector - this will happen, iiuc, during split / merge and after topology changes (because tablets will be rebalanced).

How many tablets may there be in a single table?

@piodul
Copy link
Collaborator

piodul commented Apr 18, 2024

We are aiming (at least initially) for tablet size - which isn't too small - we'd like to have 5-10GB in size. I'm not sure if that makes the design easier. I don't think insert/delete of a tablet is something that happens that often - we won't be seeing very often split/merge.

Inserts to tablets structure happen any time we receive info about the tablet. This is not only during split / merge but during normal work of a new session. After session creation the vector will be empty, and all queries will be non-token-aware. Then we'll start receiving tablets because we send queries to wrong nodes and we'll insert those tablets to the Vec.

Aren't we going to batch updates to the tablets lookup structure and then do read-copy-update? I thought that we wanted to heavily optimized for the read case - search in Vec will be faster than lookup in BTreeMap if the structure is large enough.

@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Apr 18, 2024

We are aiming (at least initially) for tablet size - which isn't too small - we'd like to have 5-10GB in size. I'm not sure if that makes the design easier. I don't think insert/delete of a tablet is something that happens that often - we won't be seeing very often split/merge.

Inserts to tablets structure happen any time we receive info about the tablet. This is not only during split / merge but during normal work of a new session. After session creation the vector will be empty, and all queries will be non-token-aware. Then we'll start receiving tablets because we send queries to wrong nodes and we'll insert those tablets to the Vec.

Aren't we going to batch updates to the tablets lookup structure and then do read-copy-update? I thought that we wanted to heavily optimized for the read case - search in Vec will be faster than lookup in BTreeMap if the structure is large enough.

We do batching, but it's not related to this issue. The procedure to update tablet structure is:

1. Receive all tablets from the channel
2. Clone ClusterData. We get new ClusterData, which contains cloned TabletsInfo.
3. For each received tablet t:
    3.1 Remove all tablets that overlap with t from TabletsInfo
    3.2 Add t to TabletsInfo
4. Replace old ClusterData with new ClusterData (which contains new tablets).

Batching here means receiving all tablets from channel in step 1, not only 1 tablet. This reduces the amount of times we need to perform step 2, which is beneficial because ClusterData is a big structure so cloning is expensive.
This issue is about optimizing step 3.

@Lorak-mmk
Copy link
Collaborator Author

Another performance optimization that I think is best to leave for a follow up: TabletReplicas should do less allocations in a typical case.
To do this, SmallVec should be used, and something similar for a map.
We should also consider using Arc for a key in the map there.

@wprzytula wprzytula added the performance Improves performance of existing features label May 14, 2024
@wprzytula wprzytula added this to the 1.1.0 milestone May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
load-balancing performance Improves performance of existing features
Projects
None yet
Development

No branches or pull requests

4 participants