Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demux + multi-stream isn't a well supported situation #1649

Open
Tilps opened this issue Oct 16, 2021 · 2 comments
Open

Demux + multi-stream isn't a well supported situation #1649

Tilps opened this issue Oct 16, 2021 · 2 comments

Comments

@Tilps
Copy link
Contributor

Tilps commented Oct 16, 2021

Multi-stream lets us get more out of the gpu by overlapping requests - even with only 2 search threads on a single gpu, it can give a bonus, and sometimes 3 search threads may even start to make sense.
Demux backend implementation gets in the way of this. Each search thread request gets split into parts to exactly cover the number of worker threads servicing the real backends - so if another search thread comes along all workers should be expected to blocked and thus no overlapping occurs.
It is possible to 'force' overlapping by increasing minibatch size and doubling the number of workers that demux backend creates, but this makes assumptions about the ability to gather batches being efficient, and probably misses out on 'continuous overlapping' so there is probably a small amount of performance left on the table, and the batch size required may be excess to actually needed in practice.

If demux instead split tasks into separate pools per gpu, then having the threads-per-gpu setting equal to search threads would be a much closer experience to what a single gpu gets with multi-stream.
Possibly this should be a completely separate new backend, as the per-gpu pool logic is quite different from how demux currently works and may affect some esoteric use cases that current demux supports.

@borg323
Copy link
Member

borg323 commented Oct 20, 2021

I think what is needed for demux to work OK with multi-stream is to set the minimum-split-size to the expected batch size per GPU. Then increasing the demux threads will not reduce the batch size used for each GPU.

@Tilps
Copy link
Contributor Author

Tilps commented Oct 20, 2021

possibly better than what is possible without minimum-split-size, but nothing stops two workers from the same gpu picking up the two splits rather than giving them equally to each gpu, so I still think we can do better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants