Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a curator I want to know the best practices for asserting "unknowable" #36

Open
mjy opened this issue Feb 19, 2021 · 6 comments
Open

Comments

@mjy
Copy link
Member

mjy commented Feb 19, 2021

No description provided.

@mjy
Copy link
Member Author

mjy commented Feb 19, 2021

At present the semantics are to assign a Confidence with the definition along the lines of "I am confident and assert that that this attribute on this instance of this class is unknowable". Specific confidence levels that extend this concept to add "why?" are possible, for example:

  • because the some physical things is destroyed (head, book, etc.)
  • because the original data are incomprehensible

It is perhaps best to use the fewest possible number of reasons as to why something is unknowable, as it is highly doutbful that curating to a finer granularity will actually result in meaningful broader data integration etc. The principal is, minimize the amount of down-stream re-interpretation you are forcing people to do. Downstream consumers of your assertions (e.g. scientists doing science with your data) are going to operate on a few boolean descisions as to wether or not the data are useful for their needs.

@mjy mjy changed the title As a curator I want to know the best practices for indicating "unknowable" As a curator I want to know the best practices for asserting "unknowable" Feb 19, 2021
@debpaul
Copy link
Contributor

debpaul commented Feb 22, 2021

Hm. See if this paper helps with documenting (unambiguously) what is meant by "unknown." Note that #DiSSCo folks are thinking hard about this and want to standardize use of "unknown" across their network if possible. See

Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129

Table 2 from their paper (regarding Unknown and incomplete data):

Missing data terms Definition Example
unknown The information is not digitally available. Empty value in a digital record of unknown provenance
unknown:undigitized The information is not digitally available. No attempt has been made to digitize it. Empty value in a skeletal record to which data still need to be added from the label
unknown:missing The information is not digitally available. It appeared to be absent during digitization. A value of S.D. used by transcription platforms to indicate the absence of a date value
unknown:indecipherable The information is not digitally available. It appeared to be present during digitization, but failed to be captured. An indication made by a transcriber that they failed to transcribe the information
known:withheld The information is digitally available, but it has been withheld by the provider. A georeferenced record for which coordinate data are available but withheld for conservation considerations

@mjy
Copy link
Member Author

mjy commented Feb 22, 2021

Thanks. All of these are valid assertions, none of these are the assertion of "unknowable" :)

@debpaul
Copy link
Contributor

debpaul commented Feb 22, 2021

So, a good one for them to try and add!

All of these are valid assertions, none of these are the assertion of "unknowable" :)

@debpaul
Copy link
Contributor

debpaul commented Feb 22, 2021

Hm. unknown:indecipherable might be why something is "unknowable."

@mjy
Copy link
Member Author

mjy commented Feb 22, 2021

Not the same I think. That is data is present, but computers can't infer on it.

I find this somewhat telling. Rather than start with what curators might tell us, and try to get that in the standard, this seem to start with a digital product, and its nature. I.e. the most basic assertion a curator on the ground needs is "I can not do more with this because the physical thing is destroyed". Everything else for them is "bonus".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants