Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When importing a sparse vector, if the format is coordinate list, it will fail. #33162

Closed
1 task done
zhuwenxing opened this issue May 20, 2024 · 4 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20240520-a7f6193b-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-05-20 14:07:32 - INFO - ci_test]: df: 
     int_scalar  ...                                     sparse_vectors
0       1233521  ...  {"indices": [531, 904, 935, 936, 852, 771, 163...
1       -448143  ...  {"indices": [649, 493, 985, 8, 958, 986, 235, ...
2       7730545  ...  {"indices": [99, 259, 906, 753, 992, 385, 377,...
3       9615060  ...  {"indices": [424, 57, 134, 149, 335, 509, 966,...
4       8520208  ...  {"indices": [391, 19, 121, 964, 925, 372, 815,...
..          ...  ...                                                ...
995      336105  ...  {"indices": [650, 315, 333, 803, 629, 948, 738...
996     4408771  ...  {"indices": [341, 358, 504, 515, 59, 164, 180,...
997     2639893  ...  {"indices": [321, 74, 858, 122, 446, 994, 865,...
998     8129081  ...  {"indices": [302, 714, 979, 530, 581, 184, 122...
999     8146023  ...  {"indices": [743, 298, 747, 694, 50, 197, 509,...

[1000 rows x 10 columns] (bulk_insert_data.py:800)
[2024-05-20 14:07:33 - INFO - ci_test]: copied data-fields-10-rows-1000-dim-128-file-num-1-error-none-1716185252.parquet to minio (minio_comm.py:26)
[2024-05-20 14:07:33 - INFO - ci_test]: before bulk load, there are 0 working tasks (utility_wrapper.py:25)
[2024-05-20 14:07:33 - INFO - ci_test]: files to load: ['data-fields-10-rows-1000-dim-128-file-num-1-error-none-1716185252.parquet'] (utility_wrapper.py:26)
[2024-05-20 14:07:34 - INFO - ci_test]: after bulk load, there are 0 working tasks (utility_wrapper.py:34)
[2024-05-20 14:07:34 - INFO - root]: bulk insert task ids:449885606406242661 (test_bulk_insert.py:1255)
[2024-05-20 14:07:34 - INFO - ci_test]: wait bulk load timeout is 300 (utility_wrapper.py:111)
[2024-05-20 14:07:34 - INFO - ci_test]: before waiting, there are 1 pending tasks (utility_wrapper.py:113)
[2024-05-20 14:07:38 - INFO - ci_test]: after waiting, there are 0 pending tasks (utility_wrapper.py:148)
[2024-05-20 14:07:38 - INFO - ci_test]: task state distribution: {'success': set(), 'failed': {449885606406242661}, 'in_progress': {449885606406242661}} (utility_wrapper.py:149)
[2024-05-20 14:07:38 - INFO - ci_test]: {449885606406242661: <Bulk insert state:
    - taskID          : 449885606406242661,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': 'Invalid JSON string for SparseFloatVector: \'{"indices": [531, 904, 935, 936, 852, 771, 163, 332, 225, 176, 788, 128, 393, 660, 809, 205, 540, 217, 38, 428], "values": [0.9958948589184143, 0.6461136825418565, 0.6977618948194418, 0.42113632964432557, 0.01821311083715005, 0.9234855228111372, 0.33373338211872383, 0.6031843407707047, 0.08635858918899764, 0.5509549598582778, 0.49912514883213954, 0.03610733775513264, 0.09970620418674869, 0.5936325625760646, 0.41681613013903474, 0.037089529774458896, 0.15990187129850642, 0.40244075546375435, 0.019592840988600257, 0.8887417420432306]}\': importing data failed', 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2024-05-20 14:07:33
>} (utility_wrapper.py:150)
[2024-05-20 14:07:38 - INFO - ci_test]: wait for bulk load tasks completed failed, cost time: 4.06017279624939 (utility_wrapper.py:155)
[2024-05-20 14:07:38 - INFO - ci_test]: bulk insert state:False in 5.137051820755005 with states:{449885606406242661: <Bulk insert state:
    - taskID          : 449885606406242661,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': 'Invalid JSON string for SparseFloatVector: \'{"indices": [531, 904, 935, 936, 852, 771, 163, 332, 225, 176, 788, 128, 393, 660, 809, 205, 540, 217, 38, 428], "values": [0.9958948589184143, 0.6461136825418565, 0.6977618948194418, 0.42113632964432557, 0.01821311083715005, 0.9234855228111372, 0.33373338211872383, 0.6031843407707047, 0.08635858918899764, 0.5509549598582778, 0.49912514883213954, 0.03610733775513264, 0.09970620418674869, 0.5936325625760646, 0.41681613013903474, 0.037089529774458896, 0.15990187129850642, 0.40244075546375435, 0.019592840988600257, 0.8887417420432306]}\': importing data failed', 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2024-05-20 14:07:33
>} (test_bulk_insert.py:1260)

Expected Behavior

import success

Steps To Reproduce

No response

Milvus Log

standalone.log

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@yanliang567
Copy link
Contributor

/assign @cydrain
/unassign

@sre-ci-robot sre-ci-robot assigned cydrain and unassigned yanliang567 May 20, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 20, 2024
@zhuwenxing zhuwenxing changed the title [Bug]: When importing a sparse vector, if the format is {"indices": ["1", "2", "3"], "values": ["0.1", "0.2", "0.3"]}, it will fail. [Bug]: When importing a sparse vector, if the format is coordinate List, it will fail. May 20, 2024
@zhuwenxing
Copy link
Contributor Author

if file format is json, then the error is

[2024-05-20 14:54:11 - INFO - ci_test]: bulk insert state:False in 3.1445157527923584 with states:{449885606406276005: <Bulk insert state:
    - taskID          : 449885606406276005,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'failed_reason': 'strconv.ParseUint: parsing "values": invalid syntax', 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2024-05-20 14:54:08
>} (test_bulk_insert.py:1381)

@zhuwenxing zhuwenxing changed the title [Bug]: When importing a sparse vector, if the format is coordinate List, it will fail. [Bug]: When importing a sparse vector, if the format is coordinate list, it will fail. May 20, 2024
@zhuwenxing
Copy link
Contributor Author

still failed in json file format
https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-33166/2/pipeline/

[2024-05-21T08:34:44.332Z] [2024-05-21 08:30:56 - INFO - ci_test]: {449912240745937129: <Bulk insert state:

[2024-05-21T08:34:44.332Z]     - taskID          : 449912240745937129,

[2024-05-21T08:34:44.332Z]     - state           : Failed,

[2024-05-21T08:34:44.332Z]     - row_count       : 0,

[2024-05-21T08:34:44.332Z]     - infos           : {'failed_reason': 'invalid index type: 345(json.Number)', 'progress_percent': '0'},

[2024-05-21T08:34:44.332Z]     - id_ranges       : [],

[2024-05-21T08:34:44.332Z]     - create_ts       : 2024-05-21 08:30:51

[2024-05-21T08:34:44.332Z] >} (utility_wrapper.py:150)

[2024-05-21T08:34:44.332Z] [2024-05-21 08:30:56 - INFO - ci_test]: wait for bulk load tasks completed failed, cost time: 4.009913682937622 (utility_wrapper.py:155)

[2024-05-21T08:34:44.332Z] [2024-05-21 08:30:56 - INFO - ci_test]: bulk insert state:False in 5.019287824630737 with states:{449912240745937129: <Bulk insert state:

[2024-05-21T08:34:44.332Z]     - taskID          : 449912240745937129,

[2024-05-21T08:34:44.332Z]     - state           : Failed,

[2024-05-21T08:34:44.332Z]     - row_count       : 0,

[2024-05-21T08:34:44.332Z]     - infos           : {'failed_reason': 'invalid index type: 345(json.Number)', 'progress_percent': '0'},

[2024-05-21T08:34:44.332Z]     - id_ranges       : [],

[2024-05-21T08:34:44.332Z]     - create_ts       : 2024-05-21 08:30:51

sre-ci-robot pushed a commit that referenced this issue May 23, 2024
Issue: #33162

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
@yanliang567 yanliang567 modified the milestones: 2.4.2, 2.4.3, 2.4.4 May 24, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.4, 2.4.5 Jun 5, 2024
@zhuwenxing
Copy link
Contributor Author

verified and fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants