You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Over the last couple weeks I've come across a rather tricky problem a few times. One DAG run gets "stuck" in the queued state, while subsequent DAG runs will be stuck running (screenshot below). One of these issues was caused by max_active_runs being met when a task instance from a previously run DAG was cleared, and one of the tasks had depends_on_past=True. This caused the DAG run to be stuck in queued in perpetuity until it was realized that the task that wasn't getting scheduled needed the failed task in the preceding DAG run to be re-run, which in turn causes the stuck running DAG runs to be stuck in running. which caused quite a bit of confusion and stress.
Ideally there would be a new status (or a refresh of the no_status status), but a much simpler solution to this problem might be simply surfacing the full task instance details in the grid view, or at least adding a task's dependencies to the task instance details in the grid view.
Use case/motivation
I don't have a solid understanding of what underlies the UI, but IMO it'd be best to surface the full task instance details view in the task instance details sub-view of the grid view.
If that's too technically challenging for now or if there are other issues (for example, performance issues), it'd suffice to at least include the task's dependencies in the task instance details sub-view of the grid view.
As part of #29856 we should bring the dependency info to the Grid view's TI details. Its a big dump of metadata though so I'd like to try to figure out a UX that makes it easier for a user to find relevant details.
Perhaps whenever a task has any pending state we highlight the dependencies and otherwise it is collapsed by default.
What do you mean by "pending state"? Like scheduled & queued? I don't think that's sufficient, because the really tricky case IMO is when depends_on_past=True. The task will be in the "no status" state, and it's super confusing to figure out why a DAG is still running, but no tasks are moving to scheduled. This just "got" me again:
I was debugging a depends_on_past=True issue, and it took me a few minutes to realize what was going on when the task wasn't moving to scheduled, and I'm already aware of this issue. I think the solution should be to highlight dependencies if
dependencies exist and
the task is in one of no status, scheduled, or queued.
Description
Over the last couple weeks I've come across a rather tricky problem a few times. One DAG run gets "stuck" in the queued state, while subsequent DAG runs will be stuck running (screenshot below). One of these issues was caused by
max_active_runs
being met when a task instance from a previously run DAG was cleared, and one of the tasks haddepends_on_past=True
. This caused the DAG run to be stuck in queued in perpetuity until it was realized that the task that wasn't getting scheduled needed the failed task in the preceding DAG run to be re-run, which in turn causes the stuck running DAG runs to be stuck in running. which caused quite a bit of confusion and stress.Ideally there would be a new status (or a refresh of the
no_status
status), but a much simpler solution to this problem might be simply surfacing the full task instance details in the grid view, or at least adding a task's dependencies to the task instance details in the grid view.Use case/motivation
I don't have a solid understanding of what underlies the UI, but IMO it'd be best to surface the full task instance details view in the task instance details sub-view of the grid view.
If that's too technically challenging for now or if there are other issues (for example, performance issues), it'd suffice to at least include the task's dependencies in the task instance details sub-view of the grid view.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: