Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting all synonyms, common names and Genbank common name along with the Scientific name of taxons #382

Closed
bilalix opened this issue Oct 25, 2018 · 0 comments · May be fixed by #383
Closed

Comments

@bilalix
Copy link

bilalix commented Oct 25, 2018

I want to generate a JSON file containing the taxonID, the corresponding scientific name and synonyms (for the sake of simplicity the term synonyms is equivalent to synonyms, common names and Genbank common name), it will have the following structure:

{
   taxonid1 : ["sciName, syn1, syn2.."],
   taxonid2 : ["sciName, syn1, syn2.."],
   ......
 }

And I'm willing to do that for a group of descendants (e.g Viridiplantae) based on their taxonIDs.

To get the desired results, I first used ncbi.get_descendant_taxa():

descendants = ncbi.get_descendant_taxa('Viridiplantae', intermediate_nodes=True)

To get the list of taxonIDs, and afterwards I downloaded the names.dmp file (which contain the synonyms) from NCBI then extracted the the information needed from it.

I don't know but I felt like I'm doing a repetitive job here, since ete3 downloads the dump files and stores them in sqlite database. But I was forced to follow this approach because when I looked in the database I didn't find all synonyms. For instance if we take Triticum aestivum, it has the following synonyms:

Scientific name: Triticum aestivum L.
Genbank common name: bread wheat
Synonym: Triticum aestivum subsp. aestivum
         Triticum vulgare L.
Common name: Canadian hard winter wheat
             Common wheat
             Wheat

My question is, is there any possibility to add all this information while creating the database, for instance, if we used

ncbi.get_common_names([4565])

We can get:

{4565:  ["Canadian hard winter wheat", "Common wheat", "Wheat"]}

And the same thing for synonyms, common names ?

Thank you !

bilalix pushed a commit to bilalix/ete that referenced this issue Oct 30, 2018
@bilalix bilalix closed this as completed Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant