Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes on dag_id column in referencing tables to speed up deletion of dag records #39638

Merged

Conversation

pankajkoti
Copy link
Member

@pankajkoti pankajkoti commented May 15, 2024

When the dag records count gets huge, and users try to delete
DAG and DAG runs that are no longer needed or are stale, it
is observed that the deletion is significantly slow. The reason
for this is that the CASCADING DELETES are slow. Although,
we have foreign key constraints in the referencing tables, they
do not create an index implicitly on those columns (dag_id in
the referencing tables in this case). Hence, we're creating indexes
on the 5 of the 6 referencing tables for CASCADE DELETES to
speed up the deletion of records. In this PR, we're skipping to
add the index on the 6th table dag_owner_attributes as we're
facing a failure in the CI to find the constraint dag.dag_id for
that table. I plan to follow-up on the remaining 6th table it in a
separate PR. Without these indexes, it was observed that it takes
many hours to delete those records and it reduced to a few seconds
after adding those indexes.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@pankajkoti pankajkoti changed the title Add indexes on dag_id column in refencing tables to speed up deletion… Add indexes on dag_id column in refencing tables to speed up deletion of dag records May 15, 2024
@pankajkoti pankajkoti force-pushed the idx-optimise-slow-deletion-of-dags-1 branch from 26106be to fb83faf Compare May 15, 2024 11:39
@pankajkoti pankajkoti added this to the Airflow 2.9.2 milestone May 15, 2024
Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pankajkoti pankajkoti marked this pull request as ready for review May 16, 2024 14:43
@pankajkoti pankajkoti changed the title Add indexes on dag_id column in refencing tables to speed up deletion of dag records Add indexes on dag_id column in referencing tables to speed up deletion of dag records May 16, 2024
@ephraimbuddy ephraimbuddy merged commit e8183a9 into apache:main May 17, 2024
42 checks passed
@ephraimbuddy ephraimbuddy deleted the idx-optimise-slow-deletion-of-dags-1 branch May 17, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants