-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDFXML: can't parse rdf:XMLLiteral #2473
Comments
What is |
this is from |
How exactly? Maybe this a "bug" in the sense that the behavior changed in 5.x, but since the data is CIMXML and not standard RDF/XML then I don't see why Jena should be able to parse it? If anything, it should fail earlier and provide a better error message. |
If you run the code, you will find that one of the section is parsed as URI with losing link URI, the second as string literal. |
Probably, the workaround for client code could be |
This could be a major issue since the parser behaves completely differently in Jena 5.0 for CIM DifferenceModels. The nested triples under 'forwardDifferences' are all treated as a single XMLLiteral in Jena 4.8 but not in Jena 5.0. I replaced Jena 4.8 produces:
where Jena 5.0 produces:
I have no clue about the parsers and how they should behave. |
The CIM DifferenceModel Definition seems to date back to March 2001: https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html. |
With 4.10.0 I get (I extracted the string from the java and ran
with 5.1.0-dev (which does have a related change):
Some outside the spec support is reasonable but at the same time, not a free-for-all handling of random eccentricities.
With corrections: (5.1.0 dev - there was a nearby change to RRX):
In Turtle:
|
@afs
Since I also don't have access to the CIM IEC specifications, I can only refer to ENTSO-E - CIM Conformity and Interoperability where the latest "Version 3 since August 2022" of the "CGMES Conformity Assessment Scheme v3" contains examples using "rdf:parseType="Statements"" in the Test Configurations v3.0.2 (e.g. ENTSO-E_Test_Configurations_v3.0.2/v3.0/FullGrid/FullGrid_OP_diff/FullGrid_OP_diff.xml). |
I think deprecation of I'm not familiar with CIM and don't know what the editors were thinking, but how can they expect interoperability if their spec is not aligned with RDF/XML 1.1? Someone should probably open an issue for CIM. |
From what I found,
I tried to register for the CIM user group to report the issue. Unfortunately, I could not find any working interactive elements, such as issues, discussions, contact forms, or anything else. Their pages seem to be quite restrictive and buggy. For me, the CIM-DifferenceModels were the primary reason I started using Apache Jena. It was then that I first realized that CIM is not merely an XML format, but is fundamentally based on the concepts of RDF and graphs. Using DifferenceModels in conjunction with hardcoded entities (often generated from UML or XSD) is almost impossible or, at least, very cumbersome. |
Hi @arne - thank you for the "ENTSO-E - CIM Conformity and Interoperability" reference and the test suite details. tl;dr: there is case for allowing "Statements". Handling bare XML attribute
In the report, there is a bare
RDF 1.0 RDF/XML (https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals) does not allow it. The fact ARP warns it is deprecated goes back a long time, pre RDF 1.0. I can't find a report of the deprecation - it's probbaly in email achieves pre-2004. This is unrelated to the use of "Statements". So the questions I have for bare
The link seems to imply it's faulty data. But if it is common+common, saying "normal for CIM", then adding it directly to RRX still has to answer whether it should allow non-standard usage - it is bad use of XML. One possibility - add a "CIM" language with a "CIM Reader" that inherits from ParserRDFXML_SAX with customization hooks (or a context setting.) BTW, the RDF/XML writers don't write bare form. They will make up a namespace name if the data has not defined in the graph for
For "Statements", it is in a current spec and that could be added to the RRX parsers. The test don't have The information about the word used to create the |
#2477 adds
|
I think it is fair to say: All occurrences of the bare Google discovered an old draft of the IEC standard Part 5xx: CIM XML Model Exchange Format . For example, on page 21, the RDF example lacks the namespace, but the text explicitly mentions the correct namespace.
Today I also discovered the RDF Syntax User Guide listed on https://www.entsoe.eu/data/cim/cim-for-grid-models-exchange/, which contains a chapter "General differences between CIM XML (552) and RDF XML (W3C)". This chapter explains a lot right at the beginning:
It lists many of the issues I've stumbled upon over the last few years. Unfortunately, the rdf:parseType="Statements" issue is missing from this chapter. There, I also found the latest Metadata and Document Header Data Exchange Specification, which includes parts of the Difference Model Specification:
Description for "forwardDifferences":
With Jena, I would want to be able to parse the forwardDifferences and backwardDifferences into separate graphs, so that I can apply them as additions and deletions, or use them in a Here is sample code to illustrate a way to read the forwardDifferences as a graph:
This process is not pretty, and writing DifferenceModels might even be uglier. Accepting Next Steps ?I would like to approach ENTSO-E so that they may extend their chapter "General differences between CIM XML (552) and RDF XML (W3C)." Proposing the use of rdf:parseType="Literal" to improve general compatibility with RDF/XML could be one suggestion. However, does anyone have a good idea on how to express DifferenceModels with nested graphs for forwardDifferences and backwardDifferences in RDF/XML? (My knowledge of the standards and parsers is not sufficient to devise a practical solution.) |
Some interesting points in those docs about the differences.
One way to encode the difference is use named graphs: if for JSON-LD, then using a blank node graph name is natural. This is using RDF datasets as "packages of graphs" - the default graph is the manifest and main data, the named graphs are sets of triples referred to from the default graph. However, there isn't a standard dataset/quads syntax based on XML. (TriX is a sort of de-facto standard but it isn't pretty.) Based on the package idea, there is always a zip file with named files as named graphs. Not "a standard" but it only needs basic, common tools to work with.
TriG:
|
PR #2477 does not cover bare The test files Test Configurations v3.0.2 do not use un-namespaced I'm considering this bug report as "done" until there is new information about bare |
Version
5.0.0
What happened?
The following code works fine in Jena 4.10.0, but fails in Jena 5.0.0
Is there any workaround? This issue blocks upgrading Jena version.
Relevant output and stacktrace
The text was updated successfully, but these errors were encountered: