Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache DataCite responses locally #31

Open
katrinleinweber opened this issue Mar 6, 2020 · 5 comments
Open

Cache DataCite responses locally #31

katrinleinweber opened this issue Mar 6, 2020 · 5 comments
Labels

Comments

@katrinleinweber
Copy link

It would useful for teams of bibliometricians to be able to share/sync a cache of their DataCite queries. Analogous to pybliometrics Python package for example.

Which of the "cache"-related CRAN packages seems most usable to ensure that? I tried it with r-lib/memoise#106 but that approach seems to have failed.

@sckott
Copy link
Contributor

sckott commented Mar 6, 2020

Thanks for the issue @katrinleinweber - What exactly do you want to cache? The HTTP response with headers and the raw response body? Or the parsed response as a data.frame/list? Or some other format?

@sckott sckott added the question label Mar 6, 2020
@katrinleinweber
Copy link
Author

I think pybliometrics caches the entire HTTP response.

I presume the risk of discarding potentially useful metadata (maybe an etag for requesting a refresh later?) is not worth the small saving in storage space. Maybe @pybliometrics-dev can comment about their thinking what to cache? I tried to find an explanation in the blame trail of the above-linked lines and read PR 17.

@sckott
Copy link
Contributor

sckott commented Mar 10, 2020

thanks @katrinleinweber

I have been tinkering with this pkg in development https://github.com/ropenscilabs/webmiddens exactly for the use case of caching http requests/responses with expiry, etc. I'll try to get that working

@sckott
Copy link
Contributor

sckott commented Mar 10, 2020

@katrinleinweber install the version on middens branch remotes::install_github("ropensci/rdatacite@middens")

The README https://github.com/ropensci/rdatacite/tree/middens#caching has some instructions on use.

The cached data is persistent on disk - in binary format to save disk space - it's not human readable really - its a feature I want to add to webmiddens though. You can set the cache path folder - see ?dc_caching

@katrinleinweber
Copy link
Author

Awesome, thank you! I hope I'll have time to test it this March, but there is a risk that I won't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants