Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tx indexer miss block when restart #7312

Closed
yihuang opened this issue Nov 24, 2021 · 4 comments · May be fixed by #9438
Closed

tx indexer miss block when restart #7312

yihuang opened this issue Nov 24, 2021 · 4 comments · May be fixed by #9438
Labels
stale for use by stalebot

Comments

@yihuang
Copy link
Contributor

yihuang commented Nov 24, 2021

Tendermint v0.34.14

ABCI app

Environment:

  • OS (e.g. from /etc/os-release):
  • Install tools:
  • Others:

What happened:

We found that our node miss some tx indexes for certain blocks.
When investigating the code, we found a case that when you stop the node while the tx indexer is indexing a block, that block won't be picked up again after the restart, and it won't be indexed ever.

What you expected to happen:

tx indexer should index all the blocks.

Have you tried the latest version: no

How to reproduce it (as minimally and precisely as possible):

  1. Patch v0.34.14 like this for easier reproduction.
--- a/state/txindex/indexer_service.go
+++ b/state/txindex/indexer_service.go
@@ -2,6 +2,7 @@ package txindex

 import (
        "context"
+       "time"

        "github.com/tendermint/tendermint/libs/service"
        "github.com/tendermint/tendermint/state/indexer"
@@ -61,6 +62,7 @@ func (is *IndexerService) OnStart() error {
                        eventDataHeader := msg.Data().(types.EventDataNewBlockHeader)
                        height := eventDataHeader.Header.Height
                        batch := NewBatch(eventDataHeader.NumTxs)
+                       is.Logger.Error("[debug] start index block", "height", height)

                        for i := int64(0); i < eventDataHeader.NumTxs; i++ {
                                msg2 := <-txsSub.Out()
@@ -82,11 +84,14 @@ func (is *IndexerService) OnStart() error {
                                is.Logger.Info("indexed block", "height", height)
                        }

+                       time.Sleep(time.Second)
                        if err = is.txIdxr.AddBatch(batch); err != nil {
                                is.Logger.Error("failed to index block txs", "height", height, "err", err)
                        } else {
                                is.Logger.Debug("indexed block txs", "height", height, "num_txs", eventDataHeader.NumTxs)
                        }
+
+                       is.Logger.Error("[debug] finish index block", "height", height)
                }
        }()
        return nil
  1. Start devnet, and randomly restart the node.
  2. You can observe logs similar to this:
...
2:16PM ERR [debug] start index block height=62 module=txindex server=node
2:16PM ERR [debug] start index block height=63 module=txindex server=node
2:16PM ERR [debug] finish index block height=63 module=txindex server=node
...

It means block 62 is not indexed.

@JayT106
Copy link
Contributor

JayT106 commented Nov 24, 2021

I think #7231 solved this issue. Once the node has been rebooting. It will reindex the missing block during the ABCI Handshake.

@creachadair would it be possible to backport it into v0.34/v0.35?

@creachadair
Copy link
Contributor

I think #7231 solved this issue. Once the node has been rebooting. It will reindex the missing block during the ABCI Handshake.

@creachadair would it be possible to backport it into v0.34/v0.35?

It might be possible to backport into v0.35, but I'll have to check API compatibility. It is not practical for v0.34.

@JayT106
Copy link
Contributor

JayT106 commented Nov 25, 2021

I think #7231 solved this issue. Once the node has been rebooting. It will reindex the missing block during the ABCI Handshake.
@creachadair would it be possible to backport it into v0.34/v0.35?

It might be possible to backport into v0.35, but I'll have to check API compatibility. It is not practical for v0.34.

Sounds good! thanks!

@github-actions github-actions bot added the stale for use by stalebot label May 30, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023
@yzang2019
Copy link

This issue still not fixed, the missing tx issue after restart still exists even after applying the patch mentioned in #7231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale for use by stalebot
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants