Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch_size(batch_size) to __find_in_batches (Mongoid) #1036

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sylvain-8422
Copy link

@sylvain-8422 sylvain-8422 commented Jul 11, 2022

Add .batch_size(batch_size) to #__find_in_batches (Mongoid).

Fixes #1037 .

Although .each_slice(batch_size) is useful in order to limit how many documents are sent to Elasticsearch at a time, it does nots limit the batch size of MongoDB's getMore commands.

By default, iterating over a MongoDB collection will first return 101 documents, and then subsequent batches of 16 MiB :

https://www.mongodb.com/docs/manual/tutorial/iterate-a-cursor/#cursor-batches

For example, a MongoDB collection containing documents averaging 1 KiB might return more than 16,000 documents at a time.

Although Mongoid claims in its documentation a default batch size of 1,000 documents, it does not seem to be the case.

Also, Mongoid's .no_timeout is broken right now and does nothing:

mongodb/mongo-ruby-driver#2557

It is now likely that more than 10 minutes go by between two getMore commands and that the MongoDB cursor expires.

Adding .batch_size(batch_size) to the query makes sure that MongoDB documents are retrieved at the same rate as they are processed and indexed in Elasticsearch, and allow applications affected by the .no_timeout issue to reduce the batch size to avoid cursor timeouts.

@sylvain-8422
Copy link
Author

sylvain-8422 commented Jul 6, 2023

@shashankjo

Same simple change as before, but I fixed the conflict created by whitespace changes in main.

From ef8985e to aa38a1b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch size is ignored when fetching documents from MongoDB
2 participants