Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDFXML: can't parse rdf:XMLLiteral #2473

Closed
sszuev opened this issue May 13, 2024 · 17 comments
Closed

RDFXML: can't parse rdf:XMLLiteral #2473

sszuev opened this issue May 13, 2024 · 17 comments
Labels

Comments

@sszuev
Copy link
Contributor

sszuev commented May 13, 2024

Version

5.0.0

What happened?

The following code works fine in Jena 4.10.0, but fails in Jena 5.0.0

        String xml = "<rdf:RDF \n" +
                "    xml:base=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
                "    xmlns:dm=\"http://iec.ch/2002/schema/CIM_difference_model#\" \n" +
                "    xmlns:md=\"http://iec.ch/TC57/61970-552/ModelDescription/1#\"\n" +
                "    xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
                "    xmlns:meta=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
                "    xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
                "<dm:DifferenceModel rdf:about=\"#_248c809d-1d7b-397c-830f-6928007ae6d9\">                \n" +
                "<md:Model.version>1715589426</md:Model.version>\n" +
                "<md:Model.created>2024-05-13T08:37:06.830Z</md:Model.created>\n" +
                "<md:Model.scenarioTime>2024-05-13T08:37:06.830Z</md:Model.scenarioTime>\n" +
                "<md:Model.profile>http://profile/</md:Model.profile>\n" +
                "<md:Model.modelingAuthoritySet>unknown</md:Model.modelingAuthoritySet>\n" +
                "<meta:Model.modelVersionIri>2021-1</meta:Model.modelVersionIri>\n" +
                "<meta:Model.differenceFrom>2024-04-01T07:55:06.779475Z</meta:Model.differenceFrom>\n" +
                "<meta:Model.differenceTo>2027-10-01T08:37:06.779475Z</meta:Model.differenceTo>\n" +
                "<dm:forwardDifferences parseType=\"Statements\">\n" +
                "<cim:A rdf:about=\"#_individual-A-1\">\n" +
                "<cim:A-2-B rdf:resource=\"#_individual-B-1\"/>\n" +
                "</cim:A>\n" +
                "<cim:B rdf:about=\"#_individual-B-1\"/>\n" +
                "<cim:D rdf:about=\"#_individual-D-1\"/>\n" +
                "</dm:forwardDifferences>\n" +
                "<dm:reverseDifferences parseType=\"Statements\">\n" +
                "</dm:reverseDifferences>\n" +
                "</dm:DifferenceModel>\n" +
                "</rdf:RDF> ";


        Model res = ModelFactory.createDefaultModel();
        RDFParserBuilder
                .create()
                .fromString(xml)
                .forceLang(Lang.RDFXML)
                .build()
                .parse(res);

        res.write(System.out, "ttl");

        // dm:forwardDifferences & dm:reverseDifferences expected to be rdf:XMLLiteral literals
        Literal literal = res.listStatements(
                null,
                ResourceFactory.createProperty("http://iec.ch/2002/schema/CIM_difference_model#forwardDifferences"),
                (RDFNode) null
        ).mapWith(Statement::getLiteral).toList().get(0);

        System.out.println(literal.getLexicalForm());

Is there any workaround? This issue blocks upgrading Jena version.

Relevant output and stacktrace

Exception in thread "main" org.apache.jena.rdf.model.LiteralRequiredException: http://iec.ch/TC57/2014/CIM-schema-cim16#_individual-A-1
	at org.apache.jena.rdf.model.impl.StatementImpl.getLiteral(StatementImpl.java:101)
	at org.apache.jena.util.iterator.Map1Iterator.lambda$forEachRemaining$0(Map1Iterator.java:55)
	at org.apache.jena.util.iterator.Map1Iterator.lambda$forEachRemaining$0(Map1Iterator.java:55)
	at org.apache.jena.mem.ArrayBunch$2.forEachRemaining(ArrayBunch.java:129)
	at org.apache.jena.util.iterator.WrappedIterator.forEachRemaining(WrappedIterator.java:113)
	at org.apache.jena.mem.TrackingTripleIterator.forEachRemaining(TrackingTripleIterator.java:58)
	at org.apache.jena.util.iterator.Map1Iterator.forEachRemaining(Map1Iterator.java:54)
	at org.apache.jena.util.iterator.WrappedIterator.forEachRemaining(WrappedIterator.java:113)
	at org.apache.jena.util.iterator.Map1Iterator.forEachRemaining(Map1Iterator.java:54)
	at org.apache.jena.util.iterator.NiceIterator.asList(NiceIterator.java:241)
	at org.apache.jena.util.iterator.NiceIterator.toList(NiceIterator.java:214)
	at com.gitlab.sszuev.Main.main(Main.java:59)


### Are you interested in making a pull request?

None
@sszuev sszuev added the bug label May 13, 2024
@namedgraph
Copy link
Contributor

What is parseType="Statements"? I don't see this specified anywhere in the RDF 1.1 XML Syntax.

@sszuev
Copy link
Contributor Author

sszuev commented May 13, 2024

What is parseType="Statements"? I don't see this specified anywhere in the RDF 1.1 XML Syntax.

this is from Part 552: CIMXML Model exchange format specification.
Anyway, the parse result in Jena 5.0.0 seems incorrect regardless attribute parseType . In Jena 4.10.0, parsing does not work at all if there is no parseType.

@namedgraph
Copy link
Contributor

Anyway, the parse result in Jena 5.0.0 seems incorrect regardless attribute parseType

How exactly?

Maybe this a "bug" in the sense that the behavior changed in 5.x, but since the data is CIMXML and not standard RDF/XML then I don't see why Jena should be able to parse it? If anything, it should fail earlier and provide a better error message.

@sszuev
Copy link
Contributor Author

sszuev commented May 13, 2024

Maybe this a "bug" in the sense that the behavior changed in 5.x, but since the data is CIMXML and not standard RDF/XML then I don't see why Jena should be able to parse it? If anything, it should fail earlier and provide a better error message.

If you run the code, you will find that one of the section is parsed as URI with losing link URI, the second as string literal.
rdf:Statement is a part of RDF, but I'm not a specialist.
CIMXML - is RDF

@sszuev
Copy link
Contributor Author

sszuev commented May 13, 2024

Probably, the workaround for client code could be replace("parseType=\"Statements\"","rdf:parseType=\"Literal\"")

@arne-bdt
Copy link

This could be a major issue since the parser behaves completely differently in Jena 5.0 for CIM DifferenceModels.

The nested triples under 'forwardDifferences' are all treated as a single XMLLiteral in Jena 4.8 but not in Jena 5.0.
To explain the background: DifferenceModels work like rdf-patch. They contain a list of added triples (A/forwardDifferences) and a list of triples to delete (D/backwardDifferences).

I replaced res.write(System.out, "ttl"); with RDFDataMgr.write(System.out, res, RDFFormat.JSONLD11_PRETTY); and ran the code using Jena 4.8 and Jena 5.0.

Jena 4.8 produces:

{
    "@id": "cim:_248c809d-1d7b-397c-830f-6928007ae6d9",
    "@type": "dm:DifferenceModel",
    "md:Model.modelingAuthoritySet": "unknown",
    "cim:Model.modelVersionIri": "http://ontology.adms.ru/UIP/md/2021-1",
    "md:Model.version": "1715589426",
    "cim:Model.differenceTo": "2027-10-01T08:37:06.779475Z",
    "md:Model.created": "2024-05-13T08:37:06.830Z",
    "md:Model.profile": "http://profile/",
    "md:Model.scenarioTime": "2024-05-13T08:37:06.830Z",
    "dm:forwardDifferences": {
        "@value": "\n<cim:A xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-A-1\">\n<cim:A-2-B rdf:resource=\"#_individual-B-1\"></cim:A-2-B>\n</cim:A>\n<cim:B xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-B-1\"></cim:B>\n<cim:D xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-D-1\"></cim:D>\n",
        "@type": "rdf:XMLLiteral"
    },
    "cim:Model.differenceFrom": "2024-04-01T07:55:06.779475Z",
    "dm:reverseDifferences": {
        "@value": "\n",
        "@type": "rdf:XMLLiteral"
    },
    "@context": {
        "dm": "http://iec.ch/2002/schema/CIM_difference_model#",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "cim": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "meta": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "md": "http://iec.ch/TC57/61970-552/ModelDescription/1#"
    }
}

where Jena 5.0 produces:

{
    "@graph": [
        {
            "@id": "cim:_individual-D-1",
            "@type": "cim:D"
        },
        {
            "@id": "cim:_individual-A-1",
            "cim:A-2-B": {
                "@id": "cim:_individual-B-1"
            },
            "@type": "cim:A"
        },
        {
            "@id": "cim:_individual-B-1",
            "@type": "cim:B"
        },
        {
            "@id": "cim:_248c809d-1d7b-397c-830f-6928007ae6d9",
            "md:Model.version": "1715589426",
            "cim:Model.differenceTo": "2027-10-01T08:37:06.779475Z",
            "@type": "dm:DifferenceModel",
            "md:Model.profile": "http://profile/",
            "cim:Model.modelVersionIri": "http://ontology.adms.ru/UIP/md/2021-1",
            "md:Model.scenarioTime": "2024-05-13T08:37:06.830Z",
            "dm:reverseDifferences": "\n",
            "cim:Model.differenceFrom": "2024-04-01T07:55:06.779475Z",
            "dm:forwardDifferences": {
                "@id": "cim:_individual-A-1"
            },
            "md:Model.modelingAuthoritySet": "unknown",
            "md:Model.created": "2024-05-13T08:37:06.830Z"
        }
    ],
    "@context": {
        "dm": "http://iec.ch/2002/schema/CIM_difference_model#",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "cim": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "meta": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "md": "http://iec.ch/TC57/61970-552/ModelDescription/1#"
    }
}

I have no clue about the parsers and how they should behave.

@arne-bdt
Copy link

arne-bdt commented May 13, 2024

The CIM DifferenceModel Definition seems to date back to March 2001: https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html.
The first version of RDF/XML Syntax that I could find is from September 2001: https://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20010906/.
I really don't know why the old parser in Jena supported rdf:parseType="Statement", but since September 2001, rdf:parseType="Literal" seems to be the correct syntax.

@afs
Copy link
Member

afs commented May 13, 2024

With 4.10.0 I get (I extracted the string from the java and ran riot from the command line.

15:01:05 WARN  riot            :: [line: 17, col: 47] {W102} unqualified use of rdf:parseType is deprecated.
15:01:05 WARN  riot            :: [line: 17, col: 47] {W106} Unknown rdf:parseType: 'Statements' (treated as 'Literal'.

with 5.1.0-dev (which does have a related change):

15:02:55 WARN  riot            :: [line: 17, col: 47] XML attribute 'parseType' used for RDF property attribute - ignored
15:02:55 WARN  riot            :: [line: 24, col: 47] XML attribute 'parseType' used for RDF property attribute - ignored

Some outside the spec support is reasonable but at the same time, not a free-for-all handling of random eccentricities.

  1. Is "Statement" documented anywhere public? CIM is pay-for. What is the latest CIM? Errata?
  2. Deprecation of parseType goes back to RDF 1.0, IIUC.

With corrections: (5.1.0 dev - there was a nearby change to RRX):

{
    "@id": "cim:_248c809d-1d7b-397c-830f-6928007ae6d9",
    "dm:reverseDifferences": {
        "@value": "\n",
        "@type": "rdf:XMLLiteral"
    },
    "dm:forwardDifferences": {
        "@value": "\n<cim:A xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-A-1\">\n<cim:A-2-B rdf:resource=\"#_individual-B-1\"></cim:A-2-B>\n</cim:A>\n<cim:B xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-B-1\"></cim:B>\n<cim:D xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-D-1\"></cim:D>\n",
        "@type": "rdf:XMLLiteral"
    },
    "cim:Model.differenceTo": "2027-10-01T08:37:06.779475Z",
    "cim:Model.differenceFrom": "2024-04-01T07:55:06.779475Z",
    "cim:Model.modelVersionIri": "http://ontology.adms.ru/UIP/md/2021-1",
    "md:Model.modelingAuthoritySet": "unknown",
    "md:Model.profile": "http://profile/",
    "md:Model.scenarioTime": "2024-05-13T08:37:06.830Z",
    "md:Model.created": "2024-05-13T08:37:06.830Z",
    "md:Model.version": "1715589426",
    "@type": "dm:DifferenceModel",
    "@context": {
        "dm": "http://iec.ch/2002/schema/CIM_difference_model#",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "cim": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "meta": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "md": "http://iec.ch/TC57/61970-552/ModelDescription/1#"
    }
}

In Turtle:

PREFIX cim:  <http://iec.ch/TC57/2014/CIM-schema-cim16#>
PREFIX dm:   <http://iec.ch/2002/schema/CIM_difference_model#>
PREFIX md:   <http://iec.ch/TC57/61970-552/ModelDescription/1#>
PREFIX meta: <http://iec.ch/TC57/2014/CIM-schema-cim16#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

meta:_248c809d-1d7b-397c-830f-6928007ae6d9
        rdf:type                       dm:DifferenceModel;
        dm:forwardDifferences          "\n<cim:A xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-A-1\">\n<cim:A-2-B rdf:resource=\"#_individual-B-1\"></cim:A-2-B>\n</cim:A>\n<cim:B xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-B-1\"></cim:B>\n<cim:D xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" rdf:about=\"#_individual-D-1\"></cim:D>\n"^^rdf:XMLLiteral;
        dm:reverseDifferences          "\n"^^rdf:XMLLiteral;
        meta:Model.differenceFrom      "2024-04-01T07:55:06.779475Z";
        meta:Model.differenceTo        "2027-10-01T08:37:06.779475Z";
        meta:Model.modelVersionIri     "http://ontology.adms.ru/UIP/md/2021-1";
        md:Model.created               "2024-05-13T08:37:06.830Z";
        md:Model.modelingAuthoritySet  "unknown";
        md:Model.profile               "http://profile/";
        md:Model.scenarioTime          "2024-05-13T08:37:06.830Z";
        md:Model.version               "1715589426" .

@arne-bdt
Copy link

@afs
I don't understand parts of your answer:

  • Is there a "deprecation of parseType"? To me, rdf:parseType="Literal" does not seem to be deprecated.
  • "With corrections: (5.1.0 dev - there was a nearby change to RRX)"
    --> The result looks okay, but what corrections? I checked out "main" with "5.1.0-SNAPSHOT" and still got the error.

Since I also don't have access to the CIM IEC specifications, I can only refer to ENTSO-E - CIM Conformity and Interoperability where the latest "Version 3 since August 2022" of the "CGMES Conformity Assessment Scheme v3" contains examples using "rdf:parseType="Statements"" in the Test Configurations v3.0.2 (e.g. ENTSO-E_Test_Configurations_v3.0.2/v3.0/FullGrid/FullGrid_OP_diff/FullGrid_OP_diff.xml).

@namedgraph
Copy link
Contributor

namedgraph commented May 14, 2024

Is there a "deprecation of parseType"? To me, rdf:parseType="Literal" does not seem to be deprecated.

I think deprecation of rdf:parseType="Statements" was meant here. rdf:parseType="Literal" is fine since it's in RDF/XML 1.1.

I'm not familiar with CIM and don't know what the editors were thinking, but how can they expect interoperability if their spec is not aligned with RDF/XML 1.1? Someone should probably open an issue for CIM.

@rvesse
Copy link
Member

rvesse commented May 14, 2024

@arne-bdt I think the corrections @afs is referring to are those in #2431 to address #2430

@arne-bdt
Copy link

Is there a "deprecation of parseType"? To me, rdf:parseType="Literal" does not seem to be deprecated.

I think deprecation of rdf:parseType="Statements" was meant here. rdf:parseType="Literal" is fine since it's in RDF/XML 1.1.

I'm not familiar with CIM and don't know what the editors were thinking, but how can they expect interoperability if their spec is not aligned with RDF/XML 1.1? Someone should probably open an issue for CIM.

From what I found, rdf:parseType="Statements" cannot be deprecated since it was never part of RDF/XML. Using Google, I found one match which explains a bit about the history:
The paper RDF/XML SOURCE DECLARATION says:

[...] In another attempt, (De Vos, 2001) proposed to annotate property elements with the attribute rdf:parseType="Statements" to indicate that their content is considered the same as the content model of the rdf:RDF element but that the statements are in a separate context. [...]

I tried to register for the CIM user group to report the issue. Unfortunately, I could not find any working interactive elements, such as issues, discussions, contact forms, or anything else. Their pages seem to be quite restrictive and buggy.

For me, the CIM-DifferenceModels were the primary reason I started using Apache Jena. It was then that I first realized that CIM is not merely an XML format, but is fundamentally based on the concepts of RDF and graphs. Using DifferenceModels in conjunction with hardcoded entities (often generated from UML or XSD) is almost impossible or, at least, very cumbersome.
A lot of the existing tools and solutions do not treat CIM as being based on RDF and graph principles. I suppose that’s why, for instance, there isn’t a single ENTSO-E CIM-based data exchange process in place today that supports the DifferenceModels.
I guess that is also one reason why, over all those years, they did not recognize that their DifferenceModel format is not compatible with the RDF/XML standard.

@afs
Copy link
Member

afs commented May 16, 2024

Hi @arne - thank you for the "ENTSO-E - CIM Conformity and Interoperability" reference and the test suite details.

tl;dr: there is case for allowing "Statements". Handling bare XML attribute parseType in normal mode seems to be on dodgy ground.

@afs I don't understand parts of your answer:

  • Is there a "deprecation of parseType"? To me, rdf:parseType="Literal" does not seem to be deprecated.

In the report, there is a bare parseType and no default namespace. Correct is a namespaced XML attribute (e.g. rdf:parseType).

XML attribute 'parseType' used for RDF property attribute - ignored

RDF 1.0 RDF/XML (https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals) does not allow it. The fact ARP warns it is deprecated goes back a long time, pre RDF 1.0. I can't find a report of the deprecation - it's probbaly in email achieves pre-2004.

This is unrelated to the use of "Statements".

So the questions I have for bare parseType:

  • How common is bare parseType even in CIM usage? Basically - is the data wrong? A one off occurence?
  • Is it corrected in later versions/errata? If we take ENTSO-E - CIM Conformity and Interoperability, then bare parseType is right only for data that has not been updated.

The link seems to imply it's faulty data.

But if it is common+common, saying "normal for CIM", then adding it directly to RRX still has to answer whether it should allow non-standard usage - it is bad use of XML.

One possibility - add a "CIM" language with a "CIM Reader" that inherits from ParserRDFXML_SAX with customization hooks (or a context setting.)

BTW, the RDF/XML writers don't write bare form. They will make up a namespace name if the data has not defined in the graph for http://www.w3.org/1999/02/22-rdf-syntax-ns#.

  • "With corrections: (5.1.0 dev - there was a nearby change to RRX)"
    --> The result looks okay, but what corrections? I checked out "main" with "5.1.0-SNAPSHOT" and still got the error.

Since I also don't have access to the CIM IEC specifications, I can only refer to ENTSO-E - CIM Conformity and Interoperability where the latest "Version 3 since August 2022" of the "CGMES Conformity Assessment Scheme v3" contains examples using "rdf:parseType="Statements"" in the Test Configurations v3.0.2 (e.g. ENTSO-E_Test_Configurations_v3.0.2/v3.0/FullGrid/FullGrid_OP_diff/FullGrid_OP_diff.xml).

For "Statements", it is in a current spec and that could be added to the RRX parsers. The test don't have rdf:parseType="Literal". Does CIM/ENTSO-E specific software accept rdf:parseType="Literal"?

The information about the word used to create the rdf:XMLLiteral isn't retained. (And it would make a mess of term equality!) .The RDF/XML writers won't write "Statement".

afs added a commit to afs/jena that referenced this issue May 17, 2024
afs added a commit to afs/jena that referenced this issue May 17, 2024
@afs
Copy link
Member

afs commented May 17, 2024

#2477 adds rdf:parseType="Statements" to RRX.
It generates a warning: e.g.

WARN  riot            ::  [line: 8, col: 48] Encountered rdf:parseType='Statements'. Treated as rdf:parseType='literal'

@arne-bdt
Copy link

I think it is fair to say: All occurrences of the bare parseType="Statements" are due to sloppy implementations or copy-and-paste errors. The namespace should always be rdf:parseType="Statements".

Google discovered an old draft of the IEC standard Part 5xx: CIM XML Model Exchange Format . For example, on page 21, the RDF example lacks the namespace, but the text explicitly mentions the correct namespace.

Just for clarification with the namespace “dm” new statements are introduced that are valid
extensions to the standard RDF syntax through the new property rdf:parseType, which is called
Statements.
<property parseType=”Statements”>
<!-- Content: (definition|description)* -->
</property>
The content model of an element with rdf:parseType=”Statements” is the same as the content
model of the rdf:RDF element.

Today I also discovered the RDF Syntax User Guide listed on https://www.entsoe.eu/data/cim/cim-for-grid-models-exchange/, which contains a chapter "General differences between CIM XML (552) and RDF XML (W3C)". This chapter explains a lot right at the beginning:

The CIM XML is defined in the IEC 61970-552:2016. This version of the standard is based on
a much earlier edition in which some serialization assumptions were made. Important: When
the initial version of IEC 61970-552 was developed, the W3C recommendations on RDF XML
were not released. Therefore, there was a growing gap during the last two decades. The latest
RDF XML was standardized by W3C in 2014 (RDF 1.1 XML Syntax (w3.org)) and IEC 61970-552
did not align with this due to existing implementations objecting changes in CIM XML.

It lists many of the issues I've stumbled upon over the last few years. Unfortunately, the rdf:parseType="Statements" issue is missing from this chapter.

There, I also found the latest Metadata and Document Header Data Exchange Specification, which includes parts of the Difference Model Specification:

The content [of the DifferenceModel] is described by the Model class, the
association role forwardDifferences and association role reverseDifferences. Both association
roles may have one set of Statements.
[chapter "7.3 (dm) DifferenceModel", page 33]

Description for "forwardDifferences":

A property of the difference model whose value is a collection of statements (i.e., resources of type rdf:Statement) representing the forward difference statements.

With Jena, I would want to be able to parse the forwardDifferences and backwardDifferences into separate graphs, so that I can apply them as additions and deletions, or use them in a org.apache.jena.graph.compose.Delta.

Here is sample code to illustrate a way to read the forwardDifferences as a graph:

String xml = "<rdf:RDF \n" +
        "    xml:base=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:dm=\"http://iec.ch/2002/schema/CIM_difference_model#\" \n" +
        "    xmlns:md=\"http://iec.ch/TC57/61970-552/ModelDescription/1#\"\n" +
        "    xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:meta=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
        "<dm:DifferenceModel rdf:about=\"#_248c809d-1d7b-397c-830f-6928007ae6d9\">                \n" +
        "<md:Model.version>1715589426</md:Model.version>\n" +
        "<md:Model.created>2024-05-13T08:37:06.830Z</md:Model.created>\n" +
        "<md:Model.scenarioTime>2024-05-13T08:37:06.830Z</md:Model.scenarioTime>\n" +
        "<md:Model.profile>http://profile/</md:Model.profile>\n" +
        "<md:Model.modelingAuthoritySet>unknown</md:Model.modelingAuthoritySet>\n" +
        "<meta:Model.modelVersionIri>http://ontology.adms.ru/UIP/md/2021-1</meta:Model.modelVersionIri>\n" +
        "<meta:Model.differenceFrom>2024-04-01T07:55:06.779475Z</meta:Model.differenceFrom>\n" +
        "<meta:Model.differenceTo>2027-10-01T08:37:06.779475Z</meta:Model.differenceTo>\n" +
        "<dm:forwardDifferences rdf:parseType=\"Statements\">\n" +
        "<cim:A rdf:about=\"#_individual-A-1\">\n" +
        "<cim:A-2-B rdf:resource=\"#_individual-B-1\"/>\n" +
        "</cim:A>\n" +
        "<cim:B rdf:about=\"#_individual-B-1\"/>\n" +
        "<cim:D rdf:about=\"#_individual-D-1\"/>\n" +
        "</dm:forwardDifferences>\n" +
        "<dm:reverseDifferences parseType=\"Statements\">\n" +
        "</dm:reverseDifferences>\n" +
        "</dm:DifferenceModel>\n" +
        "</rdf:RDF> ";


Model res = ModelFactory.createDefaultModel();
RDFParserBuilder
        .create()
        .fromString(xml)
        .forceLang(Lang.RDFXML)
        .build()
        .parse(res);

RDFDataMgr.write(System.out, res, RDFFormat.JSONLD11_PRETTY);

var subjDiffModel = NodeFactory.createURI("http://iec.ch/TC57/2014/CIM-schema-cim16#_248c809d-1d7b-397c-830f-6928007ae6d9");
var predForwardDifferences = NodeFactory.createURI("http://iec.ch/2002/schema/CIM_difference_model#forwardDifferences");
//get the forward differences
var objForwardDifferences = res.getGraph().find(subjDiffModel, predForwardDifferences, Node.ANY).next().getObject();

//create a new RDF/XML graph
var forwardRDF = "<rdf:RDF \n" +
        "    xml:base=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:dm=\"http://iec.ch/2002/schema/CIM_difference_model#\" \n" +
        "    xmlns:md=\"http://iec.ch/TC57/61970-552/ModelDescription/1#\"\n" +
        "    xmlns:cim=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:meta=\"http://iec.ch/TC57/2014/CIM-schema-cim16#\"\n" +
        "    xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
        objForwardDifferences.getLiteralLexicalForm() +
        "</rdf:RDF>";

var forwardGraph = GraphFactory.createGraphMem();
forwardGraph.getPrefixMapping().setNsPrefixes(res.getNsPrefixMap());
RDFParserBuilder
        .create()
        .fromString(forwardRDF)
        .forceLang(Lang.RDFXML)
        .build()
        .parse(forwardGraph);


RDFDataMgr.write(System.out, forwardGraph, RDFFormat.JSONLD11_PRETTY);

This process is not pretty, and writing DifferenceModels might even be uglier.

Accepting rdf:parseType="Statements" seems to be a good compromise to ensure that Jena is regarded as a stable component in any CIM/XML tool chain.

Next Steps ?

I would like to approach ENTSO-E so that they may extend their chapter "General differences between CIM XML (552) and RDF XML (W3C)." Proposing the use of rdf:parseType="Literal" to improve general compatibility with RDF/XML could be one suggestion. However, does anyone have a good idea on how to express DifferenceModels with nested graphs for forwardDifferences and backwardDifferences in RDF/XML? (My knowledge of the standards and parsers is not sufficient to devise a practical solution.)

afs added a commit to afs/jena that referenced this issue May 17, 2024
afs added a commit to afs/jena that referenced this issue May 17, 2024
@afs
Copy link
Member

afs commented May 18, 2024

Some interesting points in those docs about the differences.

  • rdf:ID, rdf:about -- syntax oriented, needs to hook into the parser. But if there is to be CIM/JSON-LD, then the distinction ought to be ignored or converted to a property because it is an RDF/XML specific feature.
  • xml:base -- This can be addressed in processing parser output with an RDFStream on the way to the model for storage. Uses a URI with private scheme e.g. base: as a marker. Teh data then has relative URIs/. Use with care!
  • Schema-defined datatyping. Very much inspired by XML!
    It can be done by feeding the parse output to an RDFStream that looks for the properties with schema-defined datatype and converting the triple object from string to the datatype.

One way to encode the difference is use named graphs: if for JSON-LD, then using a blank node graph name is natural. This is using RDF datasets as "packages of graphs" - the default graph is the manifest and main data, the named graphs are sets of triples referred to from the default graph.

However, there isn't a standard dataset/quads syntax based on XML. (TriX is a sort of de-facto standard but it isn't pretty.)

Based on the package idea, there is always a zip file with named files as named graphs. Not "a standard" but it only needs basic, common tools to work with.

=== Quads
{
    "@graph": [
        {
            "@id": "cim:_248c809d-1d7b-397c-830f-6928007ae6d9",
            "md:Model.created": "2024-05-13T08:37:06.830Z",
            "cim:Model.differenceTo": "2027-10-01T08:37:06.779475Z",
            "md:Model.version": "1715589426",
            "cim:Model.differenceFrom": "2024-04-01T07:55:06.779475Z",
            "cim:Model.modelVersionIri": "http://ontology.adms.ru/UIP/md/2021-1",
            "dm:forwardDifferences": "_:b0",
            "dm:reverseDifferences": {
                "@value": "\n    ",
                "@type": "rdf:XMLLiteral"
            },
            "@type": "dm:DifferenceModel",
            "md:Model.scenarioTime": "2024-05-13T08:37:06.830Z",
            "md:Model.modelingAuthoritySet": "unknown",
            "md:Model.profile": "http://profile/"
        },
        {
            "@id": "_:b0",
            "@graph": [
                {
                    "@id": "cim:_individual-D-1",
                    "@type": "cim:D"
                },
                {
                    "@id": "cim:_individual-A-1",
                    "cim:A-2-B": {
                        "@id": "cim:_individual-B-1"
                    },
                    "@type": "cim:A"
                },
                {
                    "@id": "cim:_individual-B-1",
                    "@type": "cim:B"
                }
            ]
        }
    ],
    "@context": {
        "dm": "http://iec.ch/2002/schema/CIM_difference_model#",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "cim": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "meta": "http://iec.ch/TC57/2014/CIM-schema-cim16#",
        "md": "http://iec.ch/TC57/61970-552/ModelDescription/1#"
    }
}

TriG:

PREFIX cim:  <http://iec.ch/TC57/2014/CIM-schema-cim16#>
PREFIX dm:   <http://iec.ch/2002/schema/CIM_difference_model#>
PREFIX md:   <http://iec.ch/TC57/61970-552/ModelDescription/1#>
PREFIX meta: <http://iec.ch/TC57/2014/CIM-schema-cim16#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

cim:_248c809d-1d7b-397c-830f-6928007ae6d9
        rdf:type                       dm:DifferenceModel;
        dm:forwardDifferences          _:b0;
        dm:reverseDifferences          "\n    "^^rdf:XMLLiteral;
        cim:Model.differenceFrom       "2024-04-01T07:55:06.779475Z";
        cim:Model.differenceTo         "2027-10-01T08:37:06.779475Z";
        cim:Model.modelVersionIri      "http://ontology.adms.ru/UIP/md/2021-1";
        md:Model.created               "2024-05-13T08:37:06.830Z";
        md:Model.modelingAuthoritySet  "unknown";
        md:Model.profile               "http://profile/";
        md:Model.scenarioTime          "2024-05-13T08:37:06.830Z";
        md:Model.version               "1715589426" .

_:b0 {
    cim:_individual-D-1
            rdf:type  cim:D .
    
    cim:_individual-A-1
            rdf:type   cim:A;
            cim:A-2-B  cim:_individual-B-1 .
    
    cim:_individual-B-1
            rdf:type  cim:B .
}

@afs
Copy link
Member

afs commented May 18, 2024

PR #2477 does not cover bare parseType.

The test files Test Configurations v3.0.2 do not use un-namespaced parseType.

I'm considering this bug report as "done" until there is new information about bare parseType.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants