Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include mixin classes in JSON schema definitions #1935

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mih
Copy link

@mih mih commented Feb 20, 2024

This fixes a problem that was discovered while working on psychoinformatics-de/datalad-concepts@d8bb1f1

Specifically, the JSON schema output generated from that linkml schema referenced any base classes that were not abstract, but did not include definitions for mixin classes.

This change aligns the criterion to exclude abstract, but not mixin classes.

It may be possible to exclude the references to mixin classes in the JSON schema output, and thereby be able to continue to exclude definitions for mixin classes too. However, an attempt to address this in get_type_info_for_slot_subschema() was not fully successful for me, but was impacted by generator parameterization.

Apologies for not including a test case, I am not yet familiar with a suitable development setup.

This fixes a problem that was discovered while working on
psychoinformatics-de/datalad-concepts@d8bb1f1

Specifically, the JSON schema output generated from that linkml schema
referenced any base classes that were not abtract, but did not include
definitions for mixin classes.

This change alignes the criterion to exclude abstract, but not mixin
classes.

It may be possible to exclude the references to mixin classes in the JSON
schema output, and thereby be able to continue to exclude definitions
for mixin classes too. However, an attempt to address this in
`get_type_info_for_slot_subschema()` was not fully successfull for me,
but was impacted by generator parametrization.
@cmungall
Copy link
Member

Thanks @mih! Can you give a bit more insight as to your motivation? It wasn't totally clear from the linked PR in your repo.

While in general there should not be a need to include mixins (these should not be instantiated, and jsonschemagen should successfully roll-down mixin slots and constraints), it is perhaps a little overopinionated of us to exclude. One possibility is to make this an option (although we already have a lot of options...)

@mih
Copy link
Author

mih commented Feb 22, 2024

While in general there should not be a need to include mixins (these should not be instantiated, and jsonschemagen should successfully roll-down mixin slots and constraints) [...]

My motivation is just to get a valid, self-sufficient JSON schema generated by linkml. Without this patch JSON schema complains about the output:

jsonschema.exceptions._WrappedReferencingError: PointerToNowhere: '/$defs/Distribution' does not exist within {...}

I am including the generated JSON schema below. A Distribution class is referenced (it is a mixin), but the definition of it is not included. From my POV, either the reference needs to be removed, because it is not needed (but I failed to find a code change that does this reliably), or the definition needs to be included.

Invalid JSON schema example
{
    "$defs": {
        "Activity": {
            "additionalProperties": false,
            "description": "An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.",
            "title": "Activity",
            "type": "object"
        },
        "Agent": {
            "additionalProperties": false,
            "description": "Something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.",
            "title": "Agent",
            "type": "object"
        },
        "AnnexDistributionSE": {
            "additionalProperties": false,
            "description": "Schema element for a `AnnexDistribution`.",
            "properties": {
                "access_service": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/DataService"
                        },
                        {
                            "$ref": "#/$defs/AnnexRemote"
                        },
                        {
                            "$ref": "#/$defs/AnnexRemoteSE"
                        }
                    ],
                    "description": "A data service that gives access to the distribution of a `Resource`"
                },
                "byte_size": {
                    "description": "The size of a distribution in bytes.",
                    "type": "integer"
                },
                "checksum": {
                    "$ref": "#/$defs/Checksum",
                    "description": "The checksum property provides a mechanism that can be used to verify that the contents of a file or package have not changed."
                },
                "qualified_access": {
                    "description": "Link to a description of a `access_service` relationship with `DCAT:DataService`.",
                    "items": {
                        "$ref": "#/$defs/QualifiedAnnexAccessSE"
                    },
                    "type": "array"
                }
            },
            "title": "AnnexDistributionSE",
            "type": "object"
        },
        "AnnexRemoteSE": {
            "additionalProperties": false,
            "description": "Schema element for a `AnnexRemote`.",
            "properties": {
                "endpoint_url": {
                    "description": "The root location or primary endpoint of the service (a resolvable IRI).",
                    "type": "string"
                },
                "meta_id": {
                    "description": "Unique identifier of a metadata object.",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:AnnexRemoteSE"
                    ],
                    "type": "string"
                },
                "uuid": {
                    "description": "Associated UUID identifier.",
                    "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "uuid",
                "meta_type"
            ],
            "title": "AnnexRemoteSE",
            "type": "object"
        },
        "AnnexedFileSE": {
            "additionalProperties": false,
            "description": "Schema element for a `AnnexedFile`.",
            "properties": {
                "distribution": {
                    "$ref": "#/$defs/AnnexDistributionSE",
                    "description": "An available distribution of a resource."
                },
                "gitsha": {
                    "description": "SHA1 identifier of a Git object.",
                    "pattern": "^[0-9a-f]{40}$",
                    "type": "string"
                },
                "has_part": {
                    "description": "A related resource that is included either physically or logically in the described resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/Resource"
                            },
                            {
                                "$ref": "#/$defs/Dataset"
                            },
                            {
                                "$ref": "#/$defs/Directory"
                            },
                            {
                                "$ref": "#/$defs/File"
                            },
                            {
                                "$ref": "#/$defs/FileInGit"
                            },
                            {
                                "$ref": "#/$defs/GitBlobSE"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFile"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFileSE"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersion"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersionSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "meta_id": {
                    "description": "SHA1 based identifier in the form of a CURIE with an explicit `gitsha:` namespace prefix.",
                    "pattern": "^gitsha:[0-9a-f]{40}$",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:AnnexedFileSE"
                    ],
                    "type": "string"
                },
                "qualified_part": {
                    "description": "Link to a description of a `hasPart` relationship with another resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/QualifiedPart"
                            },
                            {
                                "$ref": "#/$defs/QualifiedGitTrackedPartSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "type": {
                    "description": "Type of resource, e.g. `Dataset`, `Directory`, or `File`.",
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "meta_type",
                "gitsha"
            ],
            "title": "AnnexedFileSE",
            "type": "object"
        },
        "Checksum": {
            "additionalProperties": false,
            "description": "A Checksum is a value that allows to check the integrity of the contents of a file. Even small changes to the content of the file will change its checksum. This class allows the results of a variety of checksum and cryptographic message digest algorithms to be represented.",
            "properties": {
                "algorithm": {
                    "description": "Identifies the algorithm used to produce the subject `Checksum`.",
                    "type": "string"
                },
                "digest": {
                    "description": "Lower case hexadecimal encoded checksum digest value produced using a specific algorithm.",
                    "pattern": "^[a-fA-F0-9]+$",
                    "type": "string"
                }
            },
            "title": "Checksum",
            "type": "object"
        },
        "ChecksumAlgorithm": {
            "description": "",
            "enum": [
                "md5",
                "sha1",
                "sha256"
            ],
            "title": "ChecksumAlgorithm",
            "type": "string"
        },
        "ComponentSE": {
            "additionalProperties": false,
            "description": "Base class for any recognized dataset component type. This class should never be used directly, only its subclasses.",
            "properties": {
                "meta_id": {
                    "description": "Unique identifier of a metadata object.",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:Component"
                    ],
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "meta_type"
            ],
            "title": "ComponentSE",
            "type": "object"
        },
        "ContainerSE": {
            "additionalProperties": false,
            "description": "A container for dataset component objects.",
            "properties": {
                "components": {
                    "description": "Component list.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/ComponentSE"
                            },
                            {
                                "$ref": "#/$defs/GitTrackedSE"
                            },
                            {
                                "$ref": "#/$defs/AnnexRemoteSE"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetSE"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersionSE"
                            },
                            {
                                "$ref": "#/$defs/GitBlobSE"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFileSE"
                            }
                        ]
                    },
                    "type": "array"
                }
            },
            "title": "ContainerSE",
            "type": "object"
        },
        "DataladDatasetSE": {
            "additionalProperties": false,
            "description": "Schema element for a `DataladDataset`.",
            "properties": {
                "meta_id": {
                    "description": "Unique identifier of a metadata object.",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:DataladDatasetSE"
                    ],
                    "type": "string"
                },
                "uuid": {
                    "description": "UUID-style unique identifier. This property is the main distinguishing feature between a DataLad dataset and a plain Git or git-annex repository branch.",
                    "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
                    "type": "string"
                }
            },
            "required": [
                "uuid",
                "meta_id",
                "meta_type"
            ],
            "title": "DataladDatasetSE",
            "type": "object"
        },
        "DataladDatasetVersionSE": {
            "additionalProperties": false,
            "description": "TODO",
            "properties": {
                "distribution": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/Distribution"
                        },
                        {
                            "$ref": "#/$defs/AnnexDistribution"
                        },
                        {
                            "$ref": "#/$defs/AnnexDistributionSE"
                        }
                    ],
                    "description": "An available distribution of a resource."
                },
                "has_annex_remote": {
                    "description": "A git-annex (special) remote associated with a repository",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                },
                "has_part": {
                    "description": "A related resource that is included either physically or logically in the described resource.",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                },
                "is_version_of": {
                    "description": "A related resource of which the described resource is a version.",
                    "type": "string"
                },
                "meta_id": {
                    "description": "SHA1 based identifier in the form of a CURIE with an explicit `gitsha:` namespace prefix.",
                    "pattern": "^gitsha:[0-9a-f]{40}$",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:DataladDatasetVersionSE"
                    ],
                    "type": "string"
                },
                "qualified_part": {
                    "description": "Link to a description of a `hasPart` relationship with another resource.",
                    "items": {
                        "$ref": "#/$defs/QualifiedGitTrackedPartSE"
                    },
                    "type": "array"
                },
                "type": {
                    "description": "Type of resource, e.g. `Dataset`, `Directory`, or `File`.",
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "meta_type"
            ],
            "title": "DataladDatasetVersionSE",
            "type": "object"
        },
        "Directory": {
            "additionalProperties": false,
            "description": "Cataloging structure which contains references to other files, and possibly other directories (or datasets).",
            "properties": {
                "has_part": {
                    "description": "A related resource that is included either physically or logically in the described resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/Resource"
                            },
                            {
                                "$ref": "#/$defs/Dataset"
                            },
                            {
                                "$ref": "#/$defs/Directory"
                            },
                            {
                                "$ref": "#/$defs/File"
                            },
                            {
                                "$ref": "#/$defs/FileInGit"
                            },
                            {
                                "$ref": "#/$defs/GitBlobSE"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFile"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFileSE"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersion"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersionSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "qualified_part": {
                    "description": "Link to a description of a `hasPart` relationship with another resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/QualifiedPart"
                            },
                            {
                                "$ref": "#/$defs/QualifiedGitTrackedPartSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "type": {
                    "description": "Type of resource, e.g. `Dataset`, `Directory`, or `File`.",
                    "type": "string"
                }
            },
            "title": "Directory",
            "type": "object"
        },
        "Entity": {
            "additionalProperties": false,
            "description": "A physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.",
            "title": "Entity",
            "type": "object"
        },
        "GitBlobSE": {
            "additionalProperties": false,
            "description": "Schema element for a `FileInGit`.",
            "properties": {
                "distribution": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/Distribution"
                        },
                        {
                            "$ref": "#/$defs/AnnexDistribution"
                        },
                        {
                            "$ref": "#/$defs/AnnexDistributionSE"
                        }
                    ],
                    "description": "An available distribution of a resource."
                },
                "gitsha": {
                    "description": "SHA1 identifier of a Git object.",
                    "pattern": "^[0-9a-f]{40}$",
                    "type": "string"
                },
                "has_part": {
                    "description": "A related resource that is included either physically or logically in the described resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/Resource"
                            },
                            {
                                "$ref": "#/$defs/Dataset"
                            },
                            {
                                "$ref": "#/$defs/Directory"
                            },
                            {
                                "$ref": "#/$defs/File"
                            },
                            {
                                "$ref": "#/$defs/FileInGit"
                            },
                            {
                                "$ref": "#/$defs/GitBlobSE"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFile"
                            },
                            {
                                "$ref": "#/$defs/AnnexedFileSE"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersion"
                            },
                            {
                                "$ref": "#/$defs/DataladDatasetVersionSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "meta_id": {
                    "description": "SHA1 based identifier in the form of a CURIE with an explicit `gitsha:` namespace prefix.",
                    "pattern": "^gitsha:[0-9a-f]{40}$",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "dlccs:GitBlobSE"
                    ],
                    "type": "string"
                },
                "qualified_part": {
                    "description": "Link to a description of a `hasPart` relationship with another resource.",
                    "items": {
                        "anyOf": [
                            {
                                "$ref": "#/$defs/QualifiedPart"
                            },
                            {
                                "$ref": "#/$defs/QualifiedGitTrackedPartSE"
                            }
                        ]
                    },
                    "type": "array"
                },
                "type": {
                    "description": "Type of resource, e.g. `Dataset`, `Directory`, or `File`.",
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "meta_type",
                "gitsha"
            ],
            "title": "GitBlobSE",
            "type": "object"
        },
        "GitTrackedSE": {
            "additionalProperties": false,
            "description": "Representation for any resource tracked by Git, thereby having a unique `gitsha`-based identifier.",
            "properties": {
                "meta_id": {
                    "description": "SHA1 based identifier in the form of a CURIE with an explicit `gitsha:` namespace prefix.",
                    "pattern": "^gitsha:[0-9a-f]{40}$",
                    "type": "string"
                },
                "meta_type": {
                    "description": "Type designator of a metadata object.",
                    "enum": [
                        "https://concepts.datalad.org/schemas/datalad-dataset-components/:GitTrackedSE"
                    ],
                    "type": "string"
                }
            },
            "required": [
                "meta_id",
                "meta_type"
            ],
            "title": "GitTrackedSE",
            "type": "object"
        },
        "Location": {
            "additionalProperties": false,
            "description": "A location can be an identifiable geographic place (ISO 19112), but it can also be a non-geographic place such as a directory, row, or column. As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, and so forth.",
            "title": "Location",
            "type": "object"
        },
        "QualifiedAccess": {
            "additionalProperties": false,
            "description": "An association class for attaching additional information to an `access_service` relationship between a `DCAT:Distribution` and a `DCAT:DataService`.",
            "properties": {
                "access_id": {
                    "description": "An identifier with which a resource distribution can be requested from a `DataService`.",
                    "type": "string"
                },
                "relation": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/DataService"
                        },
                        {
                            "$ref": "#/$defs/AnnexRemote"
                        },
                        {
                            "$ref": "#/$defs/AnnexRemoteSE"
                        }
                    ],
                    "description": "The resource related to the source resource."
                }
            },
            "title": "QualifiedAccess",
            "type": "object"
        },
        "QualifiedAnnexAccess": {
            "additionalProperties": false,
            "description": "An association class for attaching additional information to an `access_service` relationship between a `DCAT:Distribution` and a `dlco:AnnexRemote`.",
            "properties": {
                "access_id": {
                    "description": "An identifier with which a resource distribution can be requested from a `DataService`.",
                    "type": "string"
                },
                "relation": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/AnnexRemote"
                        },
                        {
                            "$ref": "#/$defs/AnnexRemoteSE"
                        }
                    ],
                    "description": "The resource related to the source resource."
                }
            },
            "title": "QualifiedAnnexAccess",
            "type": "object"
        },
        "QualifiedAnnexAccessSE": {
            "additionalProperties": false,
            "description": "Schema element for a `QualifiedAnnexAccess`.",
            "properties": {
                "access_id": {
                    "description": "An identifier with which a resource distribution can be requested from a `DataService`.",
                    "type": "string"
                },
                "relation": {
                    "description": "The resource related to the source resource.",
                    "type": "string"
                }
            },
            "title": "QualifiedAnnexAccessSE",
            "type": "object"
        },
        "QualifiedGitTrackedPartSE": {
            "additionalProperties": false,
            "description": "Schema element for a `QualifiedPart`. Every part is represented by a `GitTrackedSE`.",
            "properties": {
                "at_location": {
                    "description": "The relative location of the part within the containing entity.",
                    "pattern": "[^/\u0000][^\u0000]*",
                    "type": "string"
                },
                "relation": {
                    "description": "The resource related to the source resource.",
                    "type": "string"
                }
            },
            "title": "QualifiedGitTrackedPartSE",
            "type": "object"
        },
        "SoftwareAgent": {
            "additionalProperties": false,
            "description": "Running software.",
            "title": "SoftwareAgent",
            "type": "object"
        }
    },
    "$id": "https://concepts.datalad.org/schemas/datalad-dataset-components",
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "additionalProperties": true,
    "description": "A container for dataset component objects.",
    "metamodel_version": "1.7.0",
    "properties": {
        "components": {
            "description": "Component list.",
            "items": {
                "anyOf": [
                    {
                        "$ref": "#/$defs/ComponentSE"
                    },
                    {
                        "$ref": "#/$defs/GitTrackedSE"
                    },
                    {
                        "$ref": "#/$defs/AnnexRemoteSE"
                    },
                    {
                        "$ref": "#/$defs/DataladDatasetSE"
                    },
                    {
                        "$ref": "#/$defs/DataladDatasetVersionSE"
                    },
                    {
                        "$ref": "#/$defs/GitBlobSE"
                    },
                    {
                        "$ref": "#/$defs/AnnexedFileSE"
                    }
                ]
            },
            "type": "array"
        }
    },
    "title": "datalad-dataset-components",
    "type": "object",
    "version": "UNRELEASED"
}

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants