Replies: 1 comment 4 replies
-
This is the intended default behavior. You can change it by passing an You can archive the items that you wish to remove by upserting them on their id with |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
Recently, by accident, we ended up loading one of our datasets with duplicates.
This occurred by running 4-5 times a script related to updating the retrieval filter that contained the following lines:
Once we realized what happened, we deleted those lines from our script.
Our original dataset used to contain 111 items, but now it contains 579 items because of this.
Three questions:
What way would you recommend of reverting our dataset back to the original state? We want to avoid deleting it and creating it again because that way we lose info on our previous runs. We also don't want to archive it and create a new one with a different name because that will cause issues with dependencies (huggingface, pinecone).
Is it a bug or is it intended that the existing dataset can be filled with additional data? I wish I just got an error when I run the lines of code above, rather than my dataset getting filled up with additional data.
Is it intended for a dataset to be able to contain duplicate lines, or is it a bug?
Beta Was this translation helpful? Give feedback.
All reactions