Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser links to multiple wrong external identifiers for taxon "Candidatus Endoriftia persephone" #968

Open
kbseah opened this issue Mar 14, 2024 · 10 comments

Comments

@kbseah
Copy link

kbseah commented Mar 14, 2024

While familiarizing myself with the query browser, I found an unusual error when the taxon Candidatus Endoriftia persephone is used in a query (adding a screenshot in case this the problem is specific to my browser):
https://www.globalbioticinteractions.org/?interactionType=interactsWith&targetTaxon=Candidatus%20Endoriftia%20persephone

globi_endoriftia_error

Under the taxon name, instead of a single link per external database as expected, there are multiple links. What they all have in common is that the names start with Candidatus. This is a designation used only in prokaryotic nomenclature for provisional names of taxa that are not yet in culture, because the prokaryotic code requires a type culture for valid publication (details).

Perhaps Candidatus names are being misinterpreted as Linnean trinomials by the name matcher?

Surprisingly there was no Wikidata item for Candidatus Endoriftia persephone, but I've just created one: https://www.wikidata.org/wiki/Q124846046

@jhpoelen
Copy link
Member

@kbseah thanks for sharing your observations re: Candidatus names.

Perhaps Candidatus names are being misinterpreted as Linnean trinomials by the name matcher?

Yes, perhaps! I'd have to look into the taxon alignment code and make sure that the candidatus prefix is handled properly.

Meanwhile, perhaps @dimus or @mdoering may have some insights on how their parsers (gn parser https://github.com/gnames/gnames and gbif name parser https://github.com/gbif/name-parser resp.) handle the Candidatus names.

Surprisingly there was no Wikidata item for Candidatus Endoriftia persephone, but I've just created one: https://www.wikidata.org/wiki/Q124846046

Thanks for adding the wikidata item!

@dimus
Copy link

dimus commented Apr 15, 2024

Hi @kbseah, from the point of GNparser: Candidatus Endoriftia persephone

https://parser.globalnames.org/?format=html&names=Candidatus+Endoriftia+persephone&with_details=on

@mdoering
Copy link

There is a special property for candidate names in the parsed result: http://api.checklistbank.org/parser/name?q=Candidatus%20Endoriftia%20persephone

The name in the label then is quoted entirely.

@jhpoelen
Copy link
Member

@dimus Thanks for sharing your example of results of global names parser v1.9.1.

@mdoering Also, thanks for sharing your candidatus example. Which version of the GBIF parser are you using?

@mdoering
Copy link

checklistbank always uses the very latest release, 3.11.0 at this time: https://github.com/CatalogueOfLife/backend/blob/master/pom.xml#L83
The parser exposed at GBIF lacks a bit behind.

@jhpoelen
Copy link
Member

Thanks for sharing the version of GBIF parser . Would it be an idea to include this in the result snippets?

@mdoering
Copy link

mdoering commented Apr 15, 2024

The version you mean? You can get that from the API which you can then use to find the corresponding parser and all other dependency versions:
https://api.checklistbank.org/version

Takes you to:
CatalogueOfLife/backend@81e1c72

https://github.com/CatalogueOfLife/backend/blob/81e1c72/pom.xml#L83

@jhpoelen
Copy link
Member

jhpoelen commented Apr 15, 2024

I was imagining embedding the version of the parse method in the parse response.

currently,

curl https://api.checklistbank.org/parser/name?q=Candidatus%20Endoriftia%20persephone\
 | jq .

produces:

{
  "scientificName": "\"Candidatus Endoriftia persephone\"",
  "rank": "species",
  "genus": "Endoriftia",
  "specificEpithet": "persephone",
  "candidatus": true,
  "code": "bacterial",
  "type": "scientific",
  "label": "\"Candidatus Endoriftia persephone\"",
  "labelHtml": "\"Candidatus Endoriftia persephone\"",
  "parsed": true
}

suggest to add some provenance like:

{
  "scientificName": "\"Candidatus Endoriftia persephone\"",
  "rank": "species",
  "genus": "Endoriftia",
  "specificEpithet": "persephone",
  "candidatus": true,
  "code": "bacterial",
  "type": "scientific",
  "label": "\"Candidatus Endoriftia persephone\"",
  "labelHtml": "\"Candidatus Endoriftia persephone\"",
  "parsed": true,
  "generatedBy": "https://github.com/CatalogueOfLife/backend/commit/81e1c72"
}

@jhpoelen
Copy link
Member

@kbseah Apologies for the delay in dealing with the Candidatus issue. Hoping to get to it sooner rather than later. Thanks for being patient.

@kbseah
Copy link
Author

kbseah commented May 23, 2024

hi @jhpoelen , there's no need to apologise! Thanks for your continued work in maintaining this resource for everybody.

jhpoelen added a commit to globalbioticinteractions/name-alignment-template that referenced this issue May 24, 2024
jhpoelen added a commit to globalbioticinteractions/name-alignment-template that referenced this issue May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants