Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime measurement of shapes #180

Open
ajnelson-nist opened this issue Apr 25, 2023 · 3 comments
Open

Runtime measurement of shapes #180

ajnelson-nist opened this issue Apr 25, 2023 · 3 comments

Comments

@ajnelson-nist
Copy link
Contributor

Across several projects, I've written SHACL shapes that have some significant runtimes. In some cases, I know I'm asking for this, because I know there will be many nodes that match the shape selector. In other cases, I don't necessarily know I'm asking for this, and the reach could be much more far-reaching than I realize. Separately, it could be my choice of implementation style for the shape just ended up being sluggish by design.

I have some hundreds of shapes, so I'm interested in getting some runtime performance information to see which are slow, whether reasonably slow or not.

Is there some way that pyshacl's SHACL graph could be annotated with runtime statistics, e.g.:

For each shape S,
record number of matching targets,
record runtime of the overall shape.

I'm interested in knowing basically run time per shape. I'm also interested specifically in the runtime of SPARQL constraints, but other constraints I think would be better suited at their containing-shape level.

I'm guessing output could be the original SHACL graph with a few literal-properties added to each timed NodeShape/SPARQLConstraint. With shapes frequently being blank nodes, I don't see a better way to represent this. Besides, this output form might be able to feed into a flame graph generator.

There's probably also some nuance for how deep the reporting would apply, e.g. whether a NodeShape in a sh:or branch in a sh:xone branch would get its own statistics, and how statistic roll-ups would be handled for the branching predicates (e.g. sh:xone, sh:and, sh:or).

I'm not too familiar with the internals of pyshacl, but this ask is predicated on a guess, that there's a hookable spot at a shape level that could be timed & hit-counted.

Does this sound doable? Development-light or development-intensive?

I'm aware that the "hit count" has been somewhat covered in Issue 51, but I didn't see runtime measurement noted anywhere in the open or closed Issues.

@ashleysommer
Copy link
Collaborator

This is actually something that I've been thinking about for a while.

The debug output of PySHACL is confusing, often not helpful, and leads to issues such as: #179

I'm planning to change the content of constraints debug info in debug mode, that includes emitting traces for constraints that do not fail, as opposed to only those that fail, so that you can see the execution result and execution order of all constraints. This can and should include profiling information such as time per constraint.

This can further be extended to Shapes, to include (as you suggested) number of found targets, and overall runtime of the shape.

The first version of this will be released in the next PySHACL version.

@ajnelson-nist
Copy link
Contributor Author

Exciting to hear! And I remember you've been thinking about bits and pieces of this for a while. I look forward to seeing the first pass.

@ashleysommer
Copy link
Collaborator

@ajnelson-nist New PySHACL version v0.22.1 is release that contains the first pass of the new debug output. It is much more verbose now, it emits durations of Shape evaluation, duration of each constraint check, and emitting current shape evaluation path and constraint evaluation path on each validator step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants