-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
文件比较多时,要运行几十个小时,一旦出错或者意外终止,数据库就损害了要重新生成。这里每处理100个重新保存一下 #4040
Comments
To address your concerns about saving progress every 100 files and implementing multithreading for adding entries to the vector database, here's a concise guide:
Example Implementation: import threading
from queue import Queue
def process_files(queue, vs_name):
while not queue.empty():
task = queue.get()
name, embeddings = task
# Add your file processing logic here, including updating the vector database
queue.task_done()
def main():
files_to_process = [...] # Your list of files
queue = Queue()
# Load tasks into the queue
for file in files_to_process:
embeddings = load_local_embeddings() # Function to load embeddings for a file
queue.put((file, embeddings))
# Start worker threads
num_worker_threads = 10
for i in range(num_worker_threads):
t = threading.Thread(target=process_files, args=(queue, 'your_vector_store_name'))
t.start()
queue.join() # Wait for all tasks to be processed
if __name__ == "__main__":
main() This setup uses a
|
python init_database.py -help please specify only one operate method once time. options: |
文件比较多时,要运行几十个小时,一旦出错或者意外终止,数据库就损害了要重新生成。在哪里修改代码,能够每处理100个重新保存一下或者多线程添加到向量数据库。
The text was updated successfully, but these errors were encountered: