Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base keyword breaks hash based URIs #179

Open
NicoRobertIn opened this issue Mar 22, 2024 · 6 comments
Open

base keyword breaks hash based URIs #179

NicoRobertIn opened this issue Mar 22, 2024 · 6 comments

Comments

@NicoRobertIn
Copy link

Issue Description:

The parser ruins the URI part passed to the @base keyword if this URI part is hash based

Bug Details:

When a URI is passed to the base keyword in a turtle file, if this URI ends with a #, then a part of this URI is lost during parsing, ruining all the URIs of the graph using this base

Steps to Reproduce:

  1. Create a turtle file using a base with a hash based URI and add a triple with a URI using this base. For example this one:
@base <https://example.org/route/disappeared#> .

<BrokenURI> a owl:Class .
  1. Query that base with a construct request that will retrieve that URI, for example:
construct {?s ?p ?o} where {?s ?p ?o }

Expected Behavior:

The broken URI should be <https://example.org/route/disappeared#BrokenURI>

Actual Behavior:

The following turtle is returned

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ns1: <https://example.org/route/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

ns1:BrokenURI a owl:Class .

rdf:type a rdf:Property .

From this result we can conclude that the BrokenURI is now <https://example.org/route/BrokenURI>, which is different from <https://example.org/route/disappeared#BrokenURI>

Note to Developers:

This behaviour was tested on Corese python, Corese command and Corese GUI on different computers.

The same URI modification can also be seen with a simple select * where {?s ?p ?o} request

Screenshots/Attachments:

image
image
image

@FabienGandon
Copy link
Collaborator

I think the base URI must be an absolute URI i.e. an URI with no fragment hence no #

@frmichel
Copy link
Member

Indeed, RFC3986 says: "A base URI must conform to the syntax rule (Section 4.3).
Then section 4.3 is not so easy to catch, but at least it says this:
"defining a base URI for later use by relative references calls for an absolute-URI syntax rule that does not allow a fragment."

@NicoRobertIn, I think that a base URI is not like a prefix : a prefix just entails a URI by simple string concatenation, while the base URI is used to resolve relative URIs and this is not only string concatenation.

@frmichel
Copy link
Member

Additional: to fix your problem, you should set

@base <https://example.org/route/disappeared> .
<#BrokenURI> a owl:Class .

@MaillPierre
Copy link
Contributor

MaillPierre commented Mar 22, 2024

The turtle syntax says that "@base" should be followed by an IRIREF.
An IRIREF must correspond to the following form:
'<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
Which I translate as the REGEX:
<([^\x00\-\x20<>"{}|^`\\]|\X)*>
This regex validates <https://example.org/route/disappeared#BrokenURI>.

Unless my regex is wrong, the Turtle recommendation says that the base URL used in OP's file is correct.

@frmichel
Copy link
Member

frmichel commented Mar 22, 2024

Hmmm.... indeed. But there's something weird. When I add '#' to the set of forbidden characters, then the regex still matches.
<([^#\x00\-\x20<>"{}|^\]|\X)*>`
How come?

@MaillPierre
Copy link
Contributor

MaillPierre commented Mar 22, 2024

@frmichel good catch, I updated the regex by decomposing the UCHAR regex: https://regex101.com/r/05Bh3v/3
<([^\x00\-\x20<>"{}|^`\\]|(\\u|\\U)([0-9]|[A-F]|[a-f]))*>
It still validates <https://example.org/route/disappeared#BrokenURI>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants