Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The add_license_url DAG keeps timing out #4348

Closed
krysal opened this issue May 16, 2024 · 0 comments · Fixed by #4370
Closed

The add_license_url DAG keeps timing out #4348

krysal opened this issue May 16, 2024 · 0 comments · Fixed by #4370
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow

Comments

@krysal
Copy link
Member

krysal commented May 16, 2024

Description

This DAG keeps timing out for unknown reasons when the number of items to modify is relatively high (>500k). Instead, it was verified that the batched_update DAG can handle this kind of updates for loads of millions of row. It was tested to back fill the license (by-nc-sa, 2.0) and it updated 11,090,909 records successfully.

However, continuous executions have resulted in the reappearance of licenses in the group of rows missing the field, so there could be ingestion flows that are not filling in this data or some other problem (#4318). I'd like to update the add_license_url DAG to use the batched_update and automate this process until we make sure all rows are complete.

Additional context

Related to #3885 and #4318.

@krysal krysal added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🔧 tech: airflow Involves Apache Airflow 🧱 stack: catalog Related to the catalog and Airflow DAGs labels May 16, 2024
@krysal krysal self-assigned this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

1 participant