Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest and review nomenclatural sources #22

Open
mdoering opened this issue May 16, 2017 · 12 comments
Open

Suggest and review nomenclatural sources #22

mdoering opened this issue May 16, 2017 · 12 comments

Comments

@mdoering
Copy link
Member

Initial imports from sources relevant to nomenclature should be considered. Please suggest relevant nomenclatural sources as comments in this issue. Key information is the name with authorship, the literature reference ideally with a DOI or link, type material and the basionym/protonym information

Relevant sources that should be synced continously:

Other potential sources

@rdmpage
Copy link

rdmpage commented May 16, 2017

I have several projects that map names to literature, using nomenclators as a starting point, such as ION (used by BioNames), IPNI (making well underway), and Index Fungorum (just started). Happy to contribute literature links. I'm also doing work on clustering names within nomenclators to get around massive duplications (e.g., ION and IPNI).

@rdmpage
Copy link

rdmpage commented May 16, 2017

Some additional name sources include:

World Spider Catalog LSIDs with literature as strings, some overlap with ION but better quality literature citations for older names
Species File projects LSIDs (I guess @mjy can talk about thse projects)
Nomenclator Zoologicus I have a version of the uBio file with links to ION, BHL, etc.

Some databases are explicitly about nomenclature, many are also about taxonomy (or confound the two).

@deepreef
Copy link
Collaborator

This is EXCELLENT! And EXACTLY what is needed, in my opinion! @rdmpage -- have you and @dimus compared systems for clustering names? Massive duplications also exist among literature citations (including the dataset you gave me a few years ago). Is there any work within this group to do similar clustering of literature citations? I've been chipping away at this, starting with Journal names. @gsautter has worked on this through RefBank, and there is a parsing service available that seems to work pretty well.

As I've said many times before, reconciling names is relatively easy compared to reconciling literature citations (I would estimate that 80% of the effort to reconcile and import a batch of names into GNUB is spent on reconciling and importing the associated literature) -- probably why there are so many efforts to build lists of names, and so few that focus on linking those names to literature.

In any case, I hope CoLPlus remains committed to incorporating a "names-linked-to-literature" approach, rather than just another "names and associated concepts" approach. It requires a bit of extra work up-front, but the rewards are VASTLY greater.

@dremsen
Copy link
Collaborator

dremsen commented May 17, 2017

Don't forget Index Sherborne's Animalium, Rich. I think you would have the most up to date and parsed copy. If there is more parsing to do we might consider seeing if dima is up for it but most should be in good shape. For subsequent combinations, there is a reference to the original combination (I think it's just a reference to the original genus) so there are homotypic synonyms accessible there as well. Some taxonomic database have parsed and separate nomenclature databases inherent to them. I can recall Thompson's diptera, there is an algal nomenclator. Index Fungorum, of course, etc.

@rdmpage
Copy link

rdmpage commented May 17, 2017

@dremsen The whole Sherborn - ION - BHL mapping doi:10.3897/zookeys.550.9673 should be opened up as well. AFAIK ION have it but haven't made it available to anyone not visiting their web site (e.g., I gather that BHL don't have it ). I've made a start on trying to resurrect it via screen scraping, see https://github.com/rdmpage/ion-sherborn

I've also grabbed a copy of Index Animalium and put it in a repository https://github.com/rdmpage/index-animalium

@deepreef
Copy link
Collaborator

deepreef commented May 17, 2017

@dremsen and @rdmpage : Index Animalium represents the PERFECT example of what I'm talking about. There are 7,723 literature citations in the combined bibliography, and 429,829 TNUs (approximately 350K Protonyms). It's an absolute GOLD MINE of information (massive numbers of Protonyms, homotypic synonyms/combinations/spelling variants, etc.), ALL of which are anchored to literature. The records (both bibliography and TNUs) are almost completely parsed (just another week or so needed to finish parsing the microcitations connected to each TNU record).

So... what's the hold-up? The literature! The bibliography is highly abbreviated (e.g., no titles and highly abbreviated -- and inconsistent -- Journal names). Even though it's almost fully parsed, most of the records have scant field values. Suzanne Pilsk (lead author of the paper cited by Rod) had made it her mission to tie Sherborn bibliography records to proper citations, and as of the last cut I got from her, 4,477 of them had been fleshed out. The remaining 3,246 represent (almost by definition) the most difficult to pin down. I had been working on cleaning up just the Journals, with the hope of identifying full citations (e.g., from RefBank) via Journal+Volume+Startpage, but there are no page numbers (bummer), and there are still over 2,800 unique and highly abbreviated text strings from which Journals need to be derived.

Once we do clean up & flesh out the literature (or decide that we're OK with dirty microcitations as our anchorpoints to the literature), the next hurdle will be to cross-link the microcitations in the TNU records (again, incomplete & inconsistent) to the corresponding bibliography record. That should be relatively striaghtforward -- maybe a week or two to complete. After that, the names are an absolute breeze (probably less than a day's work).

If we're OK with incomplete bibliographic citations (which doesn't connect us to BHL pages, but eventually we can flesh them out later), I'm willing to dust that project off and bump it to the top of my "CFT" (Copious Free Time) priority list, if this group thinks it's a worthy investment in time (actually, after looking at the DB, I'm getting more excited about it myself).

Bottom line: Sherborn is not a "names" problem (we already have the names as Name-Stings, plus authors, combination authors, etc.) It's a literature problem -- which brings me back to my previous post on this.

@mdoering
Copy link
Member Author

@deepreef I have created a new issue #23 to discuss how to deal with literature. Lets keep this issue for listing nomenclatural sources

Making Index Animalium open and accessible would be a very good thing.
is the version you have, Rich, the best there is to continue with? Or does the versions from Rod add anything not present in yours?

@deepreef
Copy link
Collaborator

Understood, and agreed! I just wanted to use @dremsen suggestion of Sherborn to illustrate the point made earlier. Also, we already have zillions of sources of names. There is no shortage of those. What we need for CoL+ to actually get beynd what we already have is sources of names linke dto literature, and Sherborn Index Animalium is a big one! :-)

I'm happy to share what I have, but perhaps give me a week to clean up a few loose ends. I don't know the state of others (mine is a more highly parsed version of what @dremsen provided to my years ago, which I believe was originally parsed by Pat Leary).

@rdmpage
Copy link

rdmpage commented May 17, 2017

Oh and let's not forget Wikispecies which is mixed, but has lots of literature. Unfortunately it's in a somewhat idiosyncratic format. I'll be at Wikicite 2017 next week working on a tool to parse Wikispecies citations. Apart from the literature Wikispecies is a potential source of links to Wikidata and author identifiers, so has an important role to lay for those of us obsessed with linking stuff together.

@mdoering
Copy link
Member Author

Also see official code lists #26

@mdoering
Copy link
Member Author

Came across yet another Fungi nomenclator: http://www.cybertruffle.org.uk/cybernome/eng/index.htm

@yroskov
Copy link
Contributor

yroskov commented Nov 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants