Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Improve LRE re-queuing logic #1588

Open
smathermather opened this issue Apr 19, 2021 · 3 comments
Open

Feature request: Improve LRE re-queuing logic #1588

smathermather opened this issue Apr 19, 2021 · 3 comments

Comments

@smathermather
Copy link
Contributor

With split merge, in the current implementation, available nodes are filled, but then don't get queried again for availability when done processing. If the ClusterODM node after it's done downloading a completed job would check send cached jobs out to now freed nodes, jobs where submodel # increases nodes could run substantially faster.

@smathermather smathermather changed the title Improve re-queuing logic Feature request: Improve re-queuing logic Apr 19, 2021
@smathermather
Copy link
Contributor Author

For jobs that have substantially more submodels than processing nodes, I have taken to killing the processing when I get down to just a few submodels running the OpenSfM stage, and then restarting it at the OpenSfM stage. The process detects which submodels are complete, and which are still to run, and then sends out jobs to all the available nodes, thus filling up the queue again. It's hackerish, messy, and a volatile and dangerous way to do things, but it gets the job done.

@pierotofy
Copy link
Member

but then don't get queried again for availability when done processing

This is strange; in any case, the logic at fault is probably not in ClusterODM, but in the LRE module in ODM. The LRE should take care of queuing tasks (fill up all available slots), then wait until slots become available.

Would be good to document a test case (with a small dataset) that can be reproduced easily on a development machine.

@smathermather
Copy link
Contributor Author

smathermather commented Apr 19, 2021

You know I don't have any small datasets!

In all seriousness, I think all that's needed to replicate is to set the split settings on any dataset to be such that the number of submodels exceeds the number of nodes, and probably easiest to observe if the number of submodels is roughly twice the number of nodes, as then it's easiest to observe the trailing 1 or 2 submodels.

And if you want me to open on ODM instead, I seem to forget the LRE logic is there, and ClusterODM tries to be pretty agnostic.

@pierotofy pierotofy transferred this issue from OpenDroneMap/ClusterODM Jan 25, 2023
@pierotofy pierotofy changed the title Feature request: Improve re-queuing logic Feature request: Improve LRE re-queuing logic Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants