Runtime measurement of shapes #180

ajnelson-nist · 2023-04-25T15:51:52Z

Across several projects, I've written SHACL shapes that have some significant runtimes. In some cases, I know I'm asking for this, because I know there will be many nodes that match the shape selector. In other cases, I don't necessarily know I'm asking for this, and the reach could be much more far-reaching than I realize. Separately, it could be my choice of implementation style for the shape just ended up being sluggish by design.

I have some hundreds of shapes, so I'm interested in getting some runtime performance information to see which are slow, whether reasonably slow or not.

Is there some way that pyshacl's SHACL graph could be annotated with runtime statistics, e.g.:

For each shape S,
record number of matching targets,
record runtime of the overall shape.

I'm interested in knowing basically run time per shape. I'm also interested specifically in the runtime of SPARQL constraints, but other constraints I think would be better suited at their containing-shape level.

I'm guessing output could be the original SHACL graph with a few literal-properties added to each timed NodeShape/SPARQLConstraint. With shapes frequently being blank nodes, I don't see a better way to represent this. Besides, this output form might be able to feed into a flame graph generator.

There's probably also some nuance for how deep the reporting would apply, e.g. whether a NodeShape in a sh:or branch in a sh:xone branch would get its own statistics, and how statistic roll-ups would be handled for the branching predicates (e.g. sh:xone, sh:and, sh:or).

I'm not too familiar with the internals of pyshacl, but this ask is predicated on a guess, that there's a hookable spot at a shape level that could be timed & hit-counted.

Does this sound doable? Development-light or development-intensive?

I'm aware that the "hit count" has been somewhat covered in Issue 51, but I didn't see runtime measurement noted anywhere in the open or closed Issues.

The text was updated successfully, but these errors were encountered:

ashleysommer · 2023-04-25T22:07:37Z

This is actually something that I've been thinking about for a while.

The debug output of PySHACL is confusing, often not helpful, and leads to issues such as: #179

I'm planning to change the content of constraints debug info in debug mode, that includes emitting traces for constraints that do not fail, as opposed to only those that fail, so that you can see the execution result and execution order of all constraints. This can and should include profiling information such as time per constraint.

This can further be extended to Shapes, to include (as you suggested) number of found targets, and overall runtime of the shape.

The first version of this will be released in the next PySHACL version.

ajnelson-nist · 2023-04-25T22:20:33Z

Exciting to hear! And I remember you've been thinking about bits and pieces of this for a while. I look forward to seeing the first pass.

ashleysommer · 2023-04-26T06:56:55Z

@ajnelson-nist New PySHACL version v0.22.1 is release that contains the first pass of the new debug output. It is much more verbose now, it emits durations of Shape evaluation, duration of each constraint check, and emitting current shape evaluation path and constraint evaluation path on each validator step.

ajnelson-nist mentioned this issue Apr 26, 2023

Trimming SHACL from the ontology graph #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime measurement of shapes #180

Runtime measurement of shapes #180

ajnelson-nist commented Apr 25, 2023

ashleysommer commented Apr 25, 2023

ajnelson-nist commented Apr 25, 2023

ashleysommer commented Apr 26, 2023

Runtime measurement of shapes #180

Runtime measurement of shapes #180

Comments

ajnelson-nist commented Apr 25, 2023

ashleysommer commented Apr 25, 2023

ajnelson-nist commented Apr 25, 2023

ashleysommer commented Apr 26, 2023