Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for downloading from Google Cloud Storage #398

Open
1 of 4 tasks
remrama opened this issue Feb 22, 2024 · 1 comment
Open
1 of 4 tasks

Add support for downloading from Google Cloud Storage #398

remrama opened this issue Feb 22, 2024 · 1 comment
Labels
enhancement Idea or request for a new feature

Comments

@remrama
Copy link

remrama commented Feb 22, 2024

Add a GCSDownloader that can fetch the data from Google Cloud Storage. It should support an authentication token, ideally with the option to read it from an environment variable.

See matched feature requests for other cloud storage services from Amazon's AWS (#363) and Microsoft's Azure (#382).

This would require:

  • A new downloader (GCSDownloader) in pooch/downloaders.py (see https://www.fatiando.org/pooch/latest/downloaders.html and the existing downloaders). Make sure to add it to the choose_downloader function so that Pooch can automatically find it based on the prefix (gs).
  • The test data in our data folder uploaded to the storage so we can test that it works.
  • Tests in pooch/tests/test_downloaders.py that check if the download works and that any errors that should be raised are actually raised.
  • Example documentation, probably in https://www.fatiando.org/pooch/latest/protocols.html

I've got a fully functional GCSDownloader class here in a fork, but minus the testing. It uses the google-cloud-storage package for authentication/downloading, which can be passed as a token to the downloader or read from an environment variable. It allows usage of the tqdm progress bar option.

# Authorize by setting an environment variable
import os
import pooch
credentials = "google_app_credentials.json"
url = "gs://bucket_name/blob_name.txt"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credentials
filename = pooch.retrieve(url, known_hash=None)

# Authorize by passing credentials to custom downloader
from pooch import GCSDownloader
credentials = "google_app_credentials.json"
downloader = GCSDownloader(credentials=credentials)
filename = pooch.retrieve(url, known_hash=None, downloader=downloader)

I can't speak to long-term maintenance, but I would be interested in adding tests and submitting a PR within the next month.

@remrama remrama added the enhancement Idea or request for a new feature label Feb 22, 2024
@leouieda
Copy link
Member

Thanks @remrama! We'd be happy to have this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants