Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DwC import suggestions for future version #55

Open
debpaul opened this issue Dec 3, 2021 · 6 comments
Open

DwC import suggestions for future version #55

debpaul opened this issue Dec 3, 2021 · 6 comments
Assignees

Comments

@debpaul
Copy link
Contributor

debpaul commented Dec 3, 2021

Some things to consider regarding the DwC Importer and notes / mapping in the DWC_IMPORT.md file.

  • basisOfRecord has a controlled vocabulary ("classes") associated with it. (i.e. PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence, MaterialCitation).
    • with Jared's data from INHS, we need FossilSpecimen not "PreservedSpecimen"
    • TO DO (discuss), please accept any of the suggested terms, at least PreservedSpecimen OR FossilSpecimen
  • in docs need to make it clear the "Mapping" in the IMPORT.md is not necessarily = to the darwin core term definition (example, preparations)
  • in docs we need to explain why we don't just import what someone has already mapped to the dwc:preparations field (essentially a string, w/o much standardization. Or if it is standardized, it's only within that organization)
  • we need to explain how to add data (the 3 types Matt mentions): three parsed header types (DwC, TaxonWorks native, TaxonWorks predicate).
@mjy
Copy link
Member

mjy commented Dec 3, 2021

with Jared's data from INHS, we need FossilSpecimen not "PreservedSpecimen"

This is kind of lousy semantics, but we can do it. Fossils a) can be preserved, and b) are preserved, just not by humans.

@debpaul
Copy link
Contributor Author

debpaul commented Dec 3, 2021

with Jared's data from INHS, we need FossilSpecimen not "PreservedSpecimen"

This is kind of lousy semantics, but we can do it. Fossils a) can be preserved, and b) are preserved, just not by humans.

Yah, it's a huge issue though. A flag of some sort is needed (basisOfRecord likely to change in the future). In aggregation, people want a way to sort / get ONLY fossils, or NOT fossils. Hence, this request. If frustrates A LOT of paleontologists when they cannot do this with aggregated data. Similarly, hard to help the marine folks get only the "Marine" specimens or "non-terrestrial" from aggregated data.

@mjy
Copy link
Member

mjy commented Dec 3, 2021

It's a general problem, filtering by attributes of a CollectionObject. Treating those attributes as row-types is not the answer. Understanding how to nest those attributes, and where we go to look for them, can better help resolve this. I'm sure there is a URI for fossil out there somewhere, if it matches the definition then this is the data attribute to hang off the collection object.

@debpaul
Copy link
Contributor Author

debpaul commented Dec 9, 2021

@LocoDelAssembly could you please add the 3 data types we can map? to this form https://github.com/SpeciesFileGroup/taxonworks_doc/blob/dwc/manuals/DWC_IMPORT.md. That is, dwc terms we can import, and the two other types (TW: ...) and?

@LocoDelAssembly
Copy link
Collaborator

The importer creates collection objects out of the occurrences records, so basisOfRecord could be anything that maps into collection object. Is there any special handling for FossilSpecimen or I could just add it as another allowed term @mjy?

@mjy
Copy link
Member

mjy commented Dec 10, 2021

@LocoDelAssembly let's do this:

All FossilSpecimen must be have a BiocurationClassification linked to a term with the URI http://rs.tdwg.org/dwc/terms/FossilSpecimen.

Importer behaviour:

  • Records fail to import unless the BiocurationClass is available.
  • BiocurationClass is not created on demand of import.
  • If the URI is not available:
    • Include in the Error mesage the statement above.
    • Alternatively, add a button in "Settings" to auto-create the BiocurationClass as a one-off, if needed, as a one-off.
    • Biocuration class to create on settings click: `{name: 'fossil', definition: 'The collection object that was found in the environment dead, and that has part or all of its material preserved by geological or environmental processes, for example by mineralization.', uri: 'http://rs.tdwg.org/dwc/terms/FossilSpecimen'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants