Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically pick a result when multiple hits returned? classification() #890

Open
toczydlowski opened this issue Mar 17, 2022 · 7 comments

Comments

@toczydlowski
Copy link

I am trying to run classification() in a loop on a computing cluster to look up lineage info for a large list of species ish names. Right now I am getting an error on the cluster that based on all of my debugging has to do with the entries that return multiple hits - and then pop up a prompt to enter the number for the hit you want to select. Is it possible to have classification() automatically pick the first hit when multiple hits are found? Or do you have another suggestion for bypassing this issue when running automatically on a cluster? Thanks!

@sckott
Copy link
Contributor

sckott commented Mar 17, 2022

@zachary-foster maybe i'm not remembering but I think rows is what you want https://docs.ropensci.org/taxize/reference/classification.html#arguments - and then you can programmatically decide what to do with results

@zachary-foster
Copy link
Collaborator

Yea, rows will allow you to select which result before the query. For example:

classification('Asterina', db = 'ncbi')
classification('Asterina', db = 'ncbi', rows = 1)

Although, you might be looking for a fungus and get a starfish : )

@toczydlowski
Copy link
Author

ah, thanks! yes this does what I want it to. I think for now I'll go with rows = 1, knowing I might be getting some weird results by just blindly always picking first row. there will be more hands-on QC in this case downstream so I think this fix will work. thanks team!

@salix-d
Copy link
Contributor

salix-d commented Jun 4, 2022

I don't know why these filters aren't integrated but to have the genus of the right division you can do :

taxize::classification(taxize:::get_uid('Asterina', division_filter = "ascomycete fungi")[1], "ncbi")
taxize::classification(taxize:::get_uid('Asterina', division_filter = "starfish")[1], "ncbi")

There are similar functions to get ids for each databases. The filters vary by functions (different APIs).

@zachary-foster
Copy link
Collaborator

@salix-d Good observation. I will try to look into making that an option for taxize::classification

@salix-d
Copy link
Contributor

salix-d commented Mar 6, 2023

For some reason in the classification function, ncbi is the only one to not have ... as an argument in id <- process_ids(sci_id, db, get_uid, rows = rows). Just by adding that, we could then use division_filter from classification.

Although the argument's name changes between db and idk that all dbs have that option (I know itis doesn't).
For bold you can use division and rank.
For gbif you can use kingdom, phylum, ..., genus but it still returns more than one results, but it does make sure that the one you want is at the top of the list so you can feel more confident using row = 1.

zachary-foster added a commit that referenced this issue Mar 9, 2023
@zachary-foster
Copy link
Collaborator

Yea, I have been wanting to go through and make them all more consistent. Ideally even combine all the get_* functions into a single get_id_from_name function or something like that. In the mean time I add the ... to classification for NCBI like you suggested:

library(taxize)
taxize::classification('Asterina', division_filter = "ascomycete fungi", db = "ncbi")
#> ══  1 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Asterina'
#> ✔  Found:  Asterina
#> ══  Results  ═════════════════
#> 
#> • Total: 1 
#> • Found: 1 
#> • Not Found: 0
#> $Asterina
#>                              name         rank      id
#> 1              cellular organisms      no rank  131567
#> 2                       Eukaryota superkingdom    2759
#> 3                    Opisthokonta        clade   33154
#> 4                           Fungi      kingdom    4751
#> 5                         Dikarya   subkingdom  451864
#> 6                      Ascomycota       phylum    4890
#> 7                  saccharomyceta        clade  716545
#> 8                  Pezizomycotina    subphylum  147538
#> 9                    leotiomyceta        clade  716546
#> 10                 dothideomyceta        clade  715962
#> 11                Dothideomycetes        class  147541
#> 12 Dothideomycetes incertae sedis      no rank  159987
#> 13                    Asterinales        order 1619909
#> 14                   Asterinaceae       family  281108
#> 15                       Asterina        genus  859380
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"

Created on 2023-03-09 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants