Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get files' ids? #32

Open
janxkoci opened this issue Dec 14, 2020 · 6 comments
Open

How to get files' ids? #32

janxkoci opened this issue Dec 14, 2020 · 6 comments
Labels
Milestone

Comments

@janxkoci
Copy link

Hi, sorry for stupid question, but I don't know how to get files' ids so I can download individual files from a dryad dataset.

I tried looking at our published dataset with:

> dryad_dataset("10.5061/dryad.7nt8f")
# truncated output
$`10.5061/dryad.7nt8f`$id
[1] 6817

However if I try to use that id to get files, it shows different doi for this id:

> dryad_files(6817)
# truncated output
$`6817`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.nf757"

i.e. the returned doi is rather 10.5061/dryad.nf757 instead of 10.5061/dryad.7nt8f.

So how do I get:

  • a proper ids for my dataset, to be used in functions like dryad_files?
  • a link to a particular file (e.g. Appendix S2.txt in the doi link above)?
Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera

Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=cs_CZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=cs_CZ.UTF-8    
 [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=cs_CZ.UTF-8   
 [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rdryad_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5      magrittr_2.0.1  rappdirs_0.3.1  uuid_0.1-4     
 [5] R6_2.5.0        rlang_0.4.8     hoardr_0.5.2    tools_3.6.1    
 [9] htmltools_0.5.0 ellipsis_0.3.1  digest_0.6.27   httpcode_0.3.0 
[13] tibble_3.0.4    lifecycle_0.2.0 crayon_1.3.4    zip_2.1.1      
[17] IRdisplay_0.7.0 repr_1.1.0      base64enc_0.1-3 vctrs_0.3.5    
[21] triebeard_0.3.0 IRkernel_1.1.1  curl_4.3        crul_1.0.0     
[25] evaluate_0.14   mime_0.9        pbdZMQ_0.3-3.1  compiler_3.6.1 
[29] pillar_1.4.7    urltools_1.7.3  jsonlite_1.7.1  pkgconfig_2.0.3
@janxkoci
Copy link
Author

Update

I noticed that I can use the number from a link to file on Dryad website as ids and it seem to work properly and get the right file. But how do I get that ids from rdryad?

For example the file Appendix S2.txt mentioned above is linked with the following url: https://datadryad.org/stash/downloads/file_stream/33893

Using 33893 as ids in functions returns the right doi, file description etc:

> dryad_files(33893)
# truncated output
$`33893`$`_links`$`stash:dataset`$href
[1] "/api/v2/datasets/doi%3A10.5061%2Fdryad.7nt8f"

@sckott
Copy link
Contributor

sckott commented Dec 15, 2020

Thanks for opening the issue. It's quite a mystery to me too how it works. i'll have a look though

@sckott sckott added this to the v1.1 milestone Dec 15, 2020
@sckott sckott added the Bug label Dec 15, 2020
@sckott
Copy link
Contributor

sckott commented Dec 15, 2020

Sorry for the confusion on this. I hate to point fingers, but Dryad has not explained their API well at all, especially how the different ids work, and why we have to deal with their internal IDs, and not just the DOI for the dataset itself. And they don't really respond to questions, so really is a joy!

@sckott
Copy link
Contributor

sckott commented Dec 15, 2020

Okay, so this should work, where you have to get version information first:

last <- function(x) x[length(x)]
z = dryad_dataset_versions("10.5061/dryad.7nt8f")
idpath <- z[[1]]$`_embedded`$`stash:versions`$`_links.self.href`
id <- as.numeric(last(strsplit(idpath, "/")[[1]]))
# gives you information about the files, including their individual IDs
dryad_versions_files(id)

Then you still have regex/etc. the IDs out of the strings for each file.

We really need to make this easier - any pull requests welcome - don't have a lot of time to devote to this

@janxkoci
Copy link
Author

Thanks for your reply and tips.

Early next year I plan to work on one pipeline which starts by pulling data from Dryad, so I will work more closely with this package. I cannot promise anything, but I will see if I can help to make it work in some way.

@sckott
Copy link
Contributor

sckott commented Dec 16, 2020

Thanks, sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants