Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Cache.download() deleting an expired file before knowing that it can download a new version of the file #7265

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

samish55
Copy link

@samish55 samish55 commented Oct 26, 2023

PR Description

I've attempted to handle issue #7249. The expired file won't be deleted until the file is successfully downloaded and if the download fails(for some reason) an error warning is emitted about working with an expired file in cache.

Fixes #7249

Copy link
Member

@Cadair Cadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @samish55

I think you need to refactor the logic a little bit to be sure the old file doesn't get deleted.

sunpy/data/data_manager/cache.py Outdated Show resolved Hide resolved
sunpy/data/data_manager/cache.py Show resolved Hide resolved
@nabobalis
Copy link
Contributor

Thanks for the code. I think ideally we need a unit test to make sure this behavior is working.

It is going to be tricky to do as you will need to monkey patch several calls (I think) to fake the download and make it fail.

@nabobalis nabobalis marked this pull request as draft November 27, 2023 19:00
@nabobalis nabobalis changed the title Updated Cache.download() for issue: #7249 Prevent Cache.download() deleting an expired file before knowing that it can download a new version of the file Mar 29, 2024
@nabobalis nabobalis added data Affects the data submodule No Backport A PR that isn't to be backported to any release branch. (To be used as a flag to other maintainers) Needs Review Needs reviews before merge labels Mar 29, 2024
@nabobalis nabobalis marked this pull request as ready for review March 29, 2024 21:54
@nabobalis nabobalis added Minor Change PR only needs one approval to merge and removed Needs Review Needs reviews before merge labels Apr 24, 2024
nabobalis
nabobalis previously approved these changes Apr 24, 2024
@@ -40,56 +39,60 @@ def download(self, urls, namespace='', redownload=False):
"""
Downloads the files from the urls.

The overall flow of this function is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now just in the code as a comment instead of in both places.

Comment on lines +79 to +80
if cache_details and (redownload or self._has_expired(cache_details)):
self._storage.delete_by_key('url', cache_details['url'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is logic problem here:

  1. We do not remove any old files from disk
  2. If the file changes on disk, we could remove it here but then it will break logic in the manager as we need to hash the file a second time.

@nabobalis nabobalis removed the Minor Change PR only needs one approval to merge label Apr 25, 2024
@nabobalis nabobalis marked this pull request as draft April 25, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Affects the data submodule No Backport A PR that isn't to be backported to any release branch. (To be used as a flag to other maintainers)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache.download() deletes an expired file before knowing that it can download a new version of the file
3 participants