Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

Open
skyybleu opened this issue May 17, 2024 · 1 comment
Open

Comments

@skyybleu
Copy link

skyybleu commented May 17, 2024

How did you install WebODM (docker, installer, etc.)?

I installed WebODM (and it's pre-requisites) from a Bash CLI Script in Ubuntu 20.04 LTS.

What's your browser and operating system? (Copy/paste the output of https://www.whatismybrowser.com/)

Server: Ubuntu 20.04 LTS

Host: Windows 10 Enterprise | Firefox

What is the problem?

When a reboot is triggered while a task is processing, processes can sometimes hang for abnormally long times when power is restored. This is usually fixed by canceling the hanging task and restarting it.

However: With 8 tasks queued and after cancelling the hanging task, instead of beginning processing on another task (or project), every project indicated it was queued and further attempts to cancel queued projects and relaunch the tasks indicated an unchanging queue length but no active process (again, I checked every project and every task).

Further investigation led to the discovery of a few symptoms:

  1. CPU/GPU/Memory/Disk usage were all elevated and the docker container appeared to be running when monitoring statistics from the server itself (remote desktop).
  2. Docker GPU container took a while to spin up (20-30 seconds) when they normally take about 3 or less [docker run -dp 3001:3000 --gpus all --name nodeodmgpu opendronemap/nodeodm:gpu || docker start nodeodmgpu && ../webodm.sh start]

The issue was eventually traced back to the corresponding /var/lib/docker/overlay2/[Very-Long-ID]/diff/var/www/data/tasks.json file for the hanging node whereby the codes indicated by each project were not consistent with the web client, and failed to sync automatically during any reboot/stop/start.

I.e {"code":10} (Queued) or {"code":20} (Processing??) indicated on a project that was canceled and should read {"code":50}

The solution was to use the command below to manually terminate the tasks after stopping the docker container and issuing a webodm.sh down:

sudo sed -i -e "s/:10}/:50}/g" /var/lib/docker/overlay2/[Very-Long-Node-ID]/diff/var/www/data/tasks.json
sudo sed -i -e "s/:20}/:50}/g" /var/lib/docker/overlay2/[Very-Long-Node-ID]/diff/var/www/data/tasks.json

The result was retaining the task list and not orphaning 100gb of task resources. It functions normally after restarting the services

How can we reproduce this? (What steps trigger the problem? What parameters are you using for processing? Include screenshots. If you are having issues processing a dataset, you must include a copy of your dataset uploaded on Dropbox, Google Drive or https://dronedb.app)

It seemed to be triggered by an abrupt reboot from another user account session which may have prevented docker from stopping /starting cleanly causing a de-sync between the web client container information and the node whereby the node was still commanded to process tasks that were apparently canceled or queued from the web client.

I would recommend queueing a few tasks, killing the docker process (or otherwise stopping the docker container abruptly), and cancelling the remaining tasks. This should replicate the issue, though I have not yet been able to test this as our production and test environments are both occupied

@skyybleu
Copy link
Author

Update: Issue seems to duplicate when cancelling projects in rapid succession. Clicking cancel on multiple queued projects will replicate the de-sync about 50% of the time and the queue length will remain >0 until the manual fix above is employed. Reboots do not fix it, UNLESS the queue that hangs is for the default CPU processing node that is set up by default - in which case restarting WebODM from the terminal WILL correctly update and clear the queue.

Additional launched Docker Containers for GPU Processing will not automatically sync, and upon completion of the project will generate orphaned files in the /var/lib/docker/overlay2/ directory that are not viewable, deletable or otherwise accessible from the Web Client. In this case files must be manually removed after issuing a webodm.sh stop and docker stop command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant