Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NodeNotFound when trying to load collection after milvus update and bulk insert #33200

Closed
1 task done
zhengbuqian opened this issue May 20, 2024 · 2 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhengbuqian
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master
- Deployment mode(standalone or cluster): standalone

Current Behavior

hello_milvus.py mentioned below does create collection, insert data, create index and load index. It does not drop index or colelction. bulk pytest refer to cd tests/python_client/testcases && pytest test_bulk_insert.py::TestBulkInsert::test_bulk_insert_all_field_with_parquet.

Before every test, everything in etcd and minio are dropped to ensure fresh start.

Commit order: 5c6de47 (3 days ago) -> c6e2dd05 (19 hours ago) -> a7f6193bfc (10 hours ago) -> now (0520 9 pm)

Test A:

  1. built milvus on commit 5c6de47, run hello_milvus.py
  2. run bulk pytest. all tests passed

Test B:

  1. built milvus on commit 5c6de47, run hello_milvus.py
  2. rebuilt milvus on commit a7f6193bfc
  3. run bulk pytest. the test fails with NodeNotFound error when trying to load the collection <MilvusException: (code=65535, message=NodeNotFound)> (api_request.py:46)

Test C:

  1. built milvus on commit 5c6de47, run hello_milvus.py
  2. rebuilt milvus on commit c6e2dd05
  3. run bulk pytest all tests passed

Test D:

  1. built a7f6193bfc, run hello_milvus.py
  2. run bulk pytest. all tests passed

From the above test results, seems commit a7f6193bfc has introduced some kind of compatibility issue, causing collections with segments created by bulk insert fail to load.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@zhengbuqian zhengbuqian added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 21, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 21, 2024
@yanliang567 yanliang567 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label May 21, 2024
sre-ci-robot pushed a commit that referenced this issue May 21, 2024
issue: #33200 #33207

pr#33104 remove this logic by mistake, which cause the offline node will
be kept in replica after qc recover, and request send to offline qn will
go a NodeNotFound error.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this issue May 21, 2024
issue: milvus-io#33200 milvus-io#33207

pr#33104 remove this logic by mistake, which cause the offline node will
be kept in replica after qc recover, and request send to offline qn will
go a NodeNotFound error.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue May 22, 2024
issue: #33200 #33207
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue May 22, 2024
issue: #33200 #33207
pr: #33232
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@weiliu1031
Copy link
Contributor

please verify this with latest image

@weiliu1031
Copy link
Contributor

/assign @zhengbuqian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants