Skip to content

v4.0.0

Latest
Compare
Choose a tag to compare
@kysrpex kysrpex released this 07 Dec 14:41
78a94bd

Changes since the last major version, as well as their motivation, are summarized below. For a detailed list of all changes since the last major version, check the commit history since the last commit common to this and the previous major release.

  • Rebranded OSP-core to SimPhoNy OSP. Replaced domain simphony-project.eu with simphony-osp.eu. Renamed Python package from osp-core to simphony-osp.

  • Python 3.6 is no longer supported.

Ontology management

In SimPhoNy, data takes the form of knowledge-graphs based on ontologies. All information is represented in terms of ontology individuals that belong to specific ontology classes, have specific attributes and can be connected to other individuals through relationships. Classes, attributes and relationships are defined in the ontologies. Therefore, in order for SimPhoNy to be able to properly interpret the data, such ontologies need to be made available to it. For that purpose, SimPhoNy includes an ontology management tool called pico.

This section is focused on the changes that affect the ontology languages supported by SimPhoNy, the changes affecting pico, the included ontology packages, and the definition of ontology packages.

SimPhoNy v4 aims to adhere to the OWL and RDFS standards from W3C, which led to the following changes:

In addition, the following ontology packages are now bundled with SimPhoNy:

Changes were made also to the way ontology packages are defined:

  • The collection of keywords used to define ontology packages has been reduced (click here to compare it with SimPhoNy v3). This change is motivated by the replacement of the CUDS concept by that of ontology individual (see Assertional Knowledge section of these release notes), as well as by the simplification of the retrieval of entities from ontology namespaces (see Terminological Knowledge section of these release notes).

Terminological knowledge

SimPhoNy v4 is geared towards the use of any OWL ontology or RDFS vocabulary. In the past major version, there was a bias towards the EMMO ontology, which led to a few consequences with respect to the handling of terminological knowledge. For example, as the EMMO did not require the use of annotation properties for A-Box construction, annotation properties were not supported.

  • Support for annotation properties, which are accessed normally from the namespace object.

  • It is no longer necessary to specify whether entities from ontology namespaces should be accessed by label or by prefix. It is simply possible to access them using both at all times, and all labels in all languages can be used.

Assertional knowledge (former CUDS API)

In SimPhoNy v3, assertional knowledge was represented as CUDS objects. The idea behind CUDS objects was to fit ontology entities in a hierarchical structure, inspired by the concept of mereotopology, that the EMMO ontology makes heavy use of.

Idea behind the CUDS concept from SimPhoNy v3

Idea behind the CUDS concept from SimPhoNy v3.

Therefore, CUDS objects were good to handle use-cases were mereotopology plays a role. However, the needs of the community of SimPhoNy users were different: most of them needed to use a knowledge graph rather than a hierarchical structure. This led to modifications in the implementation that tried to make a compromise mixing the original concept with RDF. However, a graph is not a hierarchical structure. Therefore, this mix was placing artificial constraints and limitations that made it impractical to use SimPhoNy whenever it was necessary to structure data in the form of a graph.

Therefore, SimPhoNy v4 foregoes this attempt to make a hierarchical structure compatible with a graph and is designed instead to just deal with the ontology-based data in its natural graph form. This has led to the following changes:

  • The concept of CUDS as a recursive container of ontology individuals controlled by the active and passive relationships defined for the ontology either in the ontology package file or YAML ontologies is no longer in the software. To transfer ontology individuals between sessions, just select them using any of the available querying methods and add them to the new session. The same behaviour that was available for CUDS objects can be achieved through the find function from the simphony_osp.tools.search module by passing the formerly defined active relationships as an argument.

  • As a consequence, the syntax for defining ontology packages has been simplified. Active and passive relationship definitions are no longer needed.

Other changes unrelated to the former CUDS concept have also been made in order to better support arbitrary OWL ontologies and RDFS vocabularies:

  • The label attribute of ontology individuals now refers to their rdfs:label/skos:prefLabel and is configurable.

  • Added the index notation as an additional manner to manage relationships, attributes and the newly supported annotations.

  • Using the index notation, any attribute can be assigned to any ontology individual. In SimPhoNy v3, the attributes that could be assigned were restricted to classes declared as the attribute's domain.

  • Non-functional ontology attributes are now supported.

Sessions and wrappers (from the user's perspective)

SimPhoNy v4 introduces significant changes in the Wrapper API. Although most changes only affect developers (see section Wrapper and Operations APIs (changes from the developers's perspective of these release notes), some of them also affect regular users.

A collection of those stem from the fact that the CUDS concept is no longer part of SimPhoNy. In SimPhoNy v3, CUDS objects enforced a hierarchical structure also on the interaction with software through wrappers, resulting in a myriad of issues. Basing SimPhoNy v4 internals completely on the RDF standard allowed to tackle such issues. This translates in a number of changes for users when dealing with sessions and wrappers:

  • Sessions no longer need to be instantiated with a wrapper object in order to work with them. The wrapper object was a "root" object from which all ontology individuals in the session were be reachable (in line with the hierarchical CUDS concept), that no longer needed. Ontology individuals are just contained in the session and can be distributed among several connected components of the session's knowledge graph. Therefore there cannot be "orphan" ontology individuals anymore (since they can always be accessed) and sessions do not need to be "pruned" anymore.

  • Improved RDF import and export features to better accommodate the new possibilities, although it is arguably still complex.

One of the features of SimPhoNy v3, the file support, was a great chance to extend the ontology-based knowledge graphs so that even the data that cannot be or may not be worth ontologizing can conceptually still fit in knowledge-graphs. Thus, in SimPhoNy v4, although the Wrapper API still needs to evolve to have all the features that would ideally be desired for them, files are considered to be first-class citizens, and are tightly integrated with the Wrapper API. From the user's perspective, however, this only translates to a small change:

  • Files are now managed through SimPhoNy operations, and can be uploaded and downloaded on-demand.

Several changes targeting usability were also introduced:

  • Session locking feature: sessions can be set as default using the with statement without needing to close them afterwards.

  • Ontology individuals imported from a file keep their custom IRIs (issue #758).

SimPhoNy v4 also introduces the Operations API, which aims to integrate code that has a relationship with specific classes from the ontology with the knowledge-graph.

Visualization

SimPhoNy's visualization tools were also lightly tweaked to blur the edges between T-Box and A-Box, accommodate the replacement of the CUDS concept with the graph structure, and improve their integration with the tools provided by the Jupyter project.

  • ontology2dot and cuds2dot are now a single tool, called semantic2dot. semantic2dot can mix plots of A-Boxes and T-Boxes. Several T-Boxes can be simultaneously plotted.

  • semantic2dot also shows plots directly in Jupyter notebooks without the need to save them to a file first.

Wrapper and Operations APIs (changes from the developers's perspective)

Although the Wrapper API in SimPhoNy v3 did its work, there was a lot of room for improving it. Even though SimPhoNy v4 tackles some of the issues and constitutes an improvement, it is worth to remark this statement still applies to the current version. The issues affecting the Wrapper API in SimPhoNy v3 were identified not just within the development team, but also thanks to feedback from its community of users. In addition, the change of paradigm from the hierarchical view of data to the graph view of data also strengthened the need for changes. The issues were:

  • In SimPhoNy v3, all CUDS to be handled by a wrapper were supposed to be reachable from a root CUDS object, the Wrapper object. In addition, as CUDS objects were conceived as a data structure, they were treated like actual separate chunks of data that had to be retrieved from the backend and "loaded". However, due to the fact that in the end the implementation ended up being a mixture of RDF and CUDS what was happening under the scenes in most cases is that an RDF graph was being stored, and loading CUDS just involved retrieving parts of this graph. In addition, due to the decision to abandon the CUDS concept, the approach had lost its meaning.

  • The Wrapper API was documented, but from the feedback that we received, not in a sufficiently deep manner.

  • Following up on the previous point, a specific pain point was that it was not clear how and when wrapper methods are called by SimPhoNy, and therefore it was difficult to understand their purpose. Many people developing wrappers even implemented functionality meant to be part of some methods in different ones, and left the former blank.

SimPhoNy v4 kept the main idea behind the Wrapper API, which is offering a triplestore-like abstraction of the underlying software (although on an ontological level, rather than on the triple level), but adapted the methods of the Wrapper API to the new reality and the received feedback, which actually led to an unification of all the session classes into a single one that hopefully offers more flexibility while being somewhat easier to understand. An example of this additional flexibility are the optional RDF handler methods or the support of files as first-class citizens, that have now dedicated methods in the Wrapper API.

As rough visual summary, the paradigm changed from

Simplified session inheritance scheme for OSP-core.

Simplified session inheritance scheme for OSP-core (SimPhoNy v3). Taken from SimPhoNy v3 documentation.

Idea behind the CUDS concept from SimPhoNy v3

Connection between the semantic layer and syntactic layer through the interoperability layer (wrappers) in an early version of SimPhoNy v3. The methods do not resemble the current state of SimPhoNy v3’s Wrapper API. Visit the SimPhoNy v3 documentation for a list of the methods in its Wrapper API.

to the one shown below.

Idea behind the CUDS concept from SimPhoNy v3

UML object diagram showing the objects involved in the SimPhoNy v4 wrapper mechanism that are relevant from a developer’s perspective. Taken from SimPhoNy v4 documentation.

Idea behind the CUDS concept from SimPhoNy v3

Flowchart showing the catalogue of possible user actions and how they translate to calls to the methods of the wrapper class, that the wrapper developer must implement. Taken from SimPhoNy v4 documentation.

Additionally, SimPhoNy v4 also introduces the Operations API, which aims to integrate code and knowledge-graph.

Other, less important changes affecting wrapper development are related to the custom data types available in SimPhoNy:

  • The fixed-length string custom data type has been removed.

  • The custom vector data type has been enhanced and is also now serialized to an RDF literal as bytes using b85 encoding. There is a single vector data type regardless of the vector's shape or length.

Internal changes in the software architecture

SimPhoNy v4 also introduces radical changes in the code and software architecture, motivated by:

  • the ambition to adhere to the RDF standard
  • facilitating the implementation of new features and improving the maintainability of the codebase

Said changes can be summarized as:

  • There is now a strict separation in the code between the interface to terminological and assertional knowledge, the sessions (where terminological and assertional knowledge lives), RDF graphs (the data structure where sessions store ontology entities encoded as RDF) and wrappers. This architectural change addresses or renders several issues outdated (a few, but not the only examples are #210, #422, #438, #551, #624, #669).

  • Sessions can now actually hold both T-Box and A-Boxes (even though users are meant to use them only to store A-Boxes). The T-Box that is used to describe an A-Box can be different for each session (although by default it is a T-Box containing the ontologies installed with pico). This feature replaces the former namespace registry.

  • SQLite and SQLAlchemy wrappers are now based on rdflib-sqlalchemy.

  • Most of the code features type-hinting now.

  • The transport layer now works at the triple level.

  • All user-facing tools have been moved to the simphony_osp.tools module.

  • The City ontology has been migrated to OWL.

  • The schema validation feature is no longer available, will likely be replaced in the future by SCHACL.

  • Removed Dockerfile.