Divergence of the number of shards in the metastore vs as seen from the control plane #5008

fulmicoton · 2024-05-21T04:37:33Z

As observed on airmail

select count (*) AS shards from shards;
 shards 
--------
 763471
(1 row)

While the control plane sees a much lower shard count (~3000s)

The text was updated successfully, but these errors were encountered:

fulmicoton · 2024-05-21T06:23:36Z

This is probably not the cause, but this causes a divergence between metastore and CP

        if let Err(control_plane_error) = self
            .auto_create_indexes(&request.subrequests, ctx.progress())
            .await
        {
            return Ok(Err(control_plane_error));
        }

fulmicoton · 2024-05-21T06:30:11Z

I suspect the problem is that we

- open shard on metastore
- init shard on ingest
- only update the model for successfully initialized shards

So if an ingester happens to be unreachable etc., we end up with some dangling shard in the metastore.

fulmicoton · 2024-05-21T08:50:01Z

scale_up shards seems buggy too.

   let open_shards_response = match progress
            .protect_future(self.metastore.open_shards(open_shards_request))
            .await
        {
            Ok(open_shards_response) => open_shards_response,
            Err(error) => {
                warn!("failed to scale up number of shards: {error}");
                model.release_scaling_permits(&source_uid, ScalingMode::Up, NUM_PERMITS);
                return;
            }
        };

init shard outcome. Closes #5008

fulmicoton · 2024-05-22T01:09:48Z

Split into #5013 #5014 #5020

On scale up, rebalance, and get_or_open_shards, the control plane was: - recording a shard on the metastore - writing the shard - record the shard on the control plane model. Error handling was not done, so a transport failure on the metastore, or (more likely) an error failure on init would break consistency between the metastore and the control plane. This PR factorizes the idea of opening a new shard for scale up, rebalance and get_or_open_shards. The new factorize logic goes: - init first - record in the metastore - record the shard on the control plane model. The last two steps are also factorized together to emphasize that we keep the control plane and metastore in sync. It works by forcing a restart of the control plane if the metastore returns an error for which we don't know if the write was a success or not. Closes #5008 Closes #5020 Closes #5013 test compilation

On scale up, rebalance, and get_or_open_shards, the control plane was: - recording a shard on the metastore - writing the shard - record the shard on the control plane model. Error handling was not done, so a transport failure on the metastore, or (more likely) an error failure on init would break consistency between the metastore and the control plane. This PR factorizes the idea of opening a new shard for scale up, rebalance and get_or_open_shards. The new factorize logic goes: - init first - record in the metastore - record the shard on the control plane model. The last two steps are also factorized together to emphasize that we keep the control plane and metastore in sync. It works by forcing a restart of the control plane if the metastore returns an error for which we don't know if the write was a success or not. Closes #5008 Closes #5020 Closes #5013

* Fixes control plane/metastore inconsistency due to open shard. On scale up, rebalance, and get_or_open_shards, the control plane was: - recording a shard on the metastore - writing the shard - record the shard on the control plane model. Error handling was not done, so a transport failure on the metastore, or (more likely) an error failure on init would break consistency between the metastore and the control plane. This PR factorizes the idea of opening a new shard for scale up, rebalance and get_or_open_shards. The new factorize logic goes: - init first - record in the metastore - record the shard on the control plane model. The last two steps are also factorized together to emphasize that we keep the control plane and metastore in sync. It works by forcing a restart of the control plane if the metastore returns an error for which we don't know if the write was a success or not. Closes #5008 Closes #5020 Closes #5013 * Apply suggestions from code review Co-authored-by: Adrien Guillo <adrien@quickwit.io> * Apply suggestions from code review Co-authored-by: Adrien Guillo <adrien@quickwit.io> --------- Co-authored-by: Adrien Guillo <adrien@quickwit.io>

fulmicoton added the bug Something isn't working label May 21, 2024

fulmicoton added a commit that referenced this issue May 21, 2024

Updating the control plane model on open shard regardless of the

5637d77

init shard outcome. Closes #5008

fulmicoton mentioned this issue May 26, 2024

Issue/5020 cp metastore inconsistency shard init #5029

Merged

fulmicoton closed this as completed in #5029 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Divergence of the number of shards in the metastore vs as seen from the control plane #5008

Divergence of the number of shards in the metastore vs as seen from the control plane #5008

fulmicoton commented May 21, 2024 •

edited

fulmicoton commented May 21, 2024

fulmicoton commented May 21, 2024

fulmicoton commented May 21, 2024

fulmicoton commented May 22, 2024

Divergence of the number of shards in the metastore vs as seen from the control plane #5008

Divergence of the number of shards in the metastore vs as seen from the control plane #5008

Comments

fulmicoton commented May 21, 2024 • edited

fulmicoton commented May 21, 2024

fulmicoton commented May 21, 2024

fulmicoton commented May 21, 2024

fulmicoton commented May 22, 2024

fulmicoton commented May 21, 2024 •

edited