RDF conversion/dumping does not honour list_elements_ordered
and produces questionable prefixes
#2069
Labels
bug
Something that should work but isn't, with an example and a test case.
community-generated
rdf-generator
Describe the bug
There is some very odd behaviour as regards to multi-valued slots upon conversion to RDF. They are by default unordered.
I know of the
list_elements_ordered
slot, and I assumed that it would ensure that RDF lists (i.e., containers), would also be ordered.Instead, the order is not preserved, i.e. the option appears to have no effect. I have tested adding this field and removing it from the desired multi-valued slot. The output is identical.
A strongly-related facet of this, is that when I was shaping my data model to make sure that every class instance has an element at the top level (i.e. RDF should know about slots which I've set as identifiers, and use these), is that I needed to manually specify
@base
. I have been using theprefix_map
argument of anRDFLibDumper.dump()
. This produces an invalid prefix line starting with@prefix @base:
, in addition to the single expected@base:
. I don't know enough RDF to say if this is invalid in all cases, however.To reproduce
A minimum working example is as follows. Unfortunately, due to the prefix-map it is not very minimal, in the sense that I had to write a Python script rather than play around with the command-line tools.
I have attached a Python script called
convert.py
, in addition to a test manifestschema.ordered.yaml
and test data filedata.yaml
. This rudimentary script takes three arguments: the 'mode' (from-turtle
orto-turtle
, you get the idea), the schema (see attachedschema.ordered.yaml
) and data file (attcheddata.ttl
and then whatever the script generated, I redirected this todata.yaml
and removed the duff second prefix line).Python script:
Schema file:
Data file:
Running the script
Here's the initial conversion from YAML data to turtle. Note the second prefix line, and that the elements of scope are not a container. Therefore, they are effectively in alphabetical order, which is not as expected, at all.
Here's the conversion of this turtle back to YAML, after removing the duff second prefix line. Interestingly, the order isn't coerced back into the schema's alphabetical order, which I would honestly have expected as well. This could be an incorrect assumption on my part.
Expected behavior
Firstly, the most severe issue is that, as mentioned above, I expect that LinkML would honour the apparent behaviour of the
list_elements_ordered
slot, when it does not. This slot claims the following:"If True, then the order of elements of a multivalued slot is guaranteed to be preserved. If False, the order may still be preserved but this is not guaranteed"
Secondly, there is this invalid second
@prefix @base
line. This does not appear to be valid RDF, but it's not clear what actually triggers this.It should be from linkml-runtime's rdflib_dumper file, but that file seems to be a special case for
@base
, which implies that there may be something else going on here, i.e. it is possible that it is added later:https://github.com/linkml/linkml-runtime/blob/main/linkml_runtime/dumpers/rdflib_dumper.py#L50
There seem to be various issues about prefixes and RDF generation open, but nothing which specifically highlights this issue which I am having with it.
Finally, when converting back to the YAML representation, the identfier field (
atom
) preserves the prefix. I don't think this is the correct behaviour, as we already have access to that and it should be implicit, even though theYAMLDumper.dump()
methods don't accept aschemaview
argument like theRDFLibDumper
's.This final thing is a quirk, but it's a fairly severe issue for me as I am using LinkML's YAML representation to make it easier for others to edit input data files, which are actually processed by a different computer program, as RDF. If there isn't a 1:1 mapping, it can be problematic, although it hasn't been so far in the way that the first two elements of this issue have been.
About your computer (if applicable, please complete the following information):
OS: Mac OS X Sonoma 14.4
Darwin darwin 23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:51:37 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T8112 arm64
The text was updated successfully, but these errors were encountered: