Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema.org is inconsistent about http and https in URIs for terms #550

Open
timbl opened this issue Feb 24, 2022 · 3 comments
Open

Schema.org is inconsistent about http and https in URIs for terms #550

timbl opened this issue Feb 24, 2022 · 3 comments

Comments

@timbl
Copy link
Member

timbl commented Feb 24, 2022

The policy in Solid has so far to leave all the historical namespaces at http://www.w3.org/ and Dublin Core etc http: space, and just to actually fetch the ontology from https: space, either by being redirected at fetch time, or by having code which just adds the s before the fetch for any ontology. This has worked fine, and all the RDF terms match correctly. Yes, in RDF if you add an 's' to the identifier then it is a different thing and won't work. So don't! :-)

The same had applied to schema.org .

But now schema.org is changing its human-readable documentation at least to use https . This breaks everything, as it fills the world with a mixture of the two, which will break code and query and data. We need to decide what to do. For example,

  • Persuade the schema.org folks to go back to using http: uniquely
  • Persuade them to switch to using https: consistently and add special-case conversion of old data
  • Decide we will violate RDF's tradition of not looking at URIs and build systems which treat the two as the same, for schema.org and anything else. Canonicalize and 'https:identifier to be alwayshttp(or alwayshttps` ).

rdflib already has some canonicalization code, for example when one uri is redirected 'moved' to another, just using the second.

If we added the ability to switch out old versions of terms for new ones, them also we could use it for moving between old and new ontologies: when you patched a file, the server could switch out the old terms in it. Could be useful. But RDF purists would maybe not like it at all, and systems which did not canonicalize internally would have to be protected by having canonicalizing adapters upstream of them.

(This arose in discussion of #536 and #534 )

@NoelDeMartin
Copy link
Contributor

Possibly related issue in schema.org's repo: schemaorg/schemaorg#2886

@pduchesne
Copy link
Contributor

As the original poster of the issue and PR above, i'd like to give some context, and my take on the suggestions above, FWIW.

This issue arose in a project where we are using schema.org to model our data structures, and the dual use of http and https schemes stroke us as a problem. Having two coexisting URI schemes for the same entities is an obvious blocker for interoperability. So we looked for guidance on what to use, and indeed followed the latest schema.org recommendations that mention the use of https scheme. That's for the context.

As for the solution, in my opinion, it feels wrong to break the RDF tradition and pro-actively canonicalize URIs to either form. Or at least it is not the place to do so in a generic toolkit or platform like rdflib or Solid. These should stay agnostic and deal with opaque URIs.
The URI scheme should ideally be harmonized and mandated by SDO (of course that means a transition period and cumbersome legacy data in the wild - no idea whether that's realistic).
If that can't be, it is up to anyone designing a domain ontology relying on SDO, to make the call and mandate the use of either http or https. And then hope for an emerging agreement between domains on what should be standard.

So, in the context of our particular usecase, our first priority is to settle for the most interoperable solution, and if that means switching back to http, that's fine. And at the same time, the PR above still holds because rdflib should stay generic and agnostic to that.

@bruceweir
Copy link

What was the motivation for making http and https resources indicate different things in RDF? Shouldn't the data be independent of the protocol used to deliver it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants