Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cleanup jobs as Celery periodic tasks #579

Open
wants to merge 21 commits into
base: beta
Choose a base branch
from

Conversation

noliveleger
Copy link
Contributor

@noliveleger noliveleger commented Oct 2, 2019

2 periodic tasks have been added:

  • remove_storage_orphans: Remove every attachment, export in bucket which is not related to a record in a DB.
  • remove_revisions: Remove every revision older than --days days. (0 by default, but forced to 90 in periodic task)

Closes kobotoolbox/kobo-docker#253
Related to kobotoolbox/kpi#2434 and #550.

Note by @jnm: there's effectively another (!) type of orphaned attachment in kobocat; consider:

  1. Create a submission with attachment 1.jpg;
  2. Edit the submission and replace 1.jpg with 2.jpg;
  3. Now both 1.jpg and 2.jpg are Attachments in the database, and both files remain in storage.

It's not the intent of this PR to handle this type of orphan, but I wanted to note it here for reference. See #792 for handling of this issue.

@noliveleger noliveleger requested a review from jnm October 2, 2019 20:23
@noliveleger noliveleger changed the title Cron cleanup jobs Add cleanup jobs as Celery periodic tasks Oct 2, 2019
@jnm jnm assigned dorey and unassigned jnm Mar 16, 2020
@noliveleger noliveleger changed the base branch from master to 2155_kpi_two_databases March 20, 2020 14:48
@noliveleger noliveleger changed the base branch from 2155_kpi_two_databases to master April 30, 2020 14:22
 # Conflicts:
 #	onadata/apps/logger/management/commands/remove_duplicated_submissions.py
 #	onadata/apps/logger/tasks.py
 #	onadata/libs/utils/gravatar.py
 #	onadata/libs/utils/redis_helper.py
 #	onadata/settings/base.py
 #	onadata/settings/kc_environ.py
@noliveleger noliveleger changed the base branch from master to beta February 18, 2022 15:46
 # Conflicts:
 #	onadata/libs/utils/redis_helper.py
 #	onadata/settings/base.py
 #	onadata/settings/dev.py
 #	onadata/settings/prod.py
Copy link
Member

@jnm jnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a "review", but:

  • Could we split the orphan attachments job into a separate PR? That one scares me
  • Removing revisions is 👍, but can we set the default to 90 days?
  • A Celery task to chip away at the huge amount of old revisions a little bit at a time, perhaps with a limited execution timeout

@bufke to have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cron task for KPI and KoBoCAT cleanup jobs
3 participants