-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datafactory errors not propagated properly by StreamParser #308
Comments
Thanks, @LaurensRietveld. I think you must be the most prolific N3.js bug finder by now. |
I worry that the performance cut of wrapping error handling around factories is too much. I think we should be able to trust factories, especially given that N3.js will have pre-validated the URL. In what cases are factories erroring? |
Validation that we'd like to perform are validation of literals (based on their datatype) and validation of IRIs. Applying such validation in the datafactory instead of some other place (e.g. in a stream after parsing) allows us to use the RDF-JS ecosystem of tools by just using a different datafactory, instead of having to wrap all other RDF-JS tools with a custom pre/post process step (kind of defeating the purpose of RDF-JS ;) ) |
To quickly jump in with my 2cents: I think there is definitely value in having validation; but that it should be opt-in given that there are cases in which it is safe to assume the data has already come from another validated source. I also think there is value in having the API for toggling this validation behavior propagated to the RDFJS spec level at some stage. |
If we expect a performance decrease, then I'm happy putting this behind a toggle obviously 👍 |
Well well, let's not exaggerate 🙂 So N3.js offers a parser. The job of the parser is to transform a syntactically valid serialization format into an in-memory representation. RDF/JS is very useful as a standard model for data, and for component interfaces. The purpose of the RDF/JS
That's possible. The N3.js parser has streaming output, in an RDF/JS compatible format. The logical way to do validation is not to interfere with the parser (whose job is only the transformation of Turtle into a valid RDF in-memory model, which all of its output is). Instead, chain an RDF/JS-compatible parser into an RDF/JS-compatible validator, which checks if the model-valid RDF nodes (which the parser is guaranteed to output) also meet other non-RDF validation constraints (such as whether literals have the right type). Now if you must do this in the factory, recall that the factory is in full control of whatever it outputs. So nothing is stopping you from, for instance, outputting But I see no architectural nor performance need to catch non-model errors in the parser. Attaching the validator as an immediate next step in the stream will not delay the validation in any way, given that N3.js has a fully streaming output.
N3.js will only instantiate valid IRIs (as this is an RDF model requirement). But if this validation includes more specific constraints, then indeed that goes beyond the parser's job of generating a valid model.
The thing is, it would require different code paths. Because currently, we make assumptions (for performance reasons) about where errors can occur and where they cannot; and when errors can occur, we avoid an expensive Simply said: you either wrap the whole thing in a Which I could always do of course, but I just don't see a convincing case yet: having streaming non-model validation after the streaming model parser is the right separation of concerns, does not sacrifice performance, and allows us to make stronger assumptions about what can fail and what cannot. |
We'll probably settle on an option similar to this one, where we'll wrap the N3 parser interface and expose a stream with proper error handling.
Nice! I did not know that validating IRIs was in scope for N3 (as we've had issues with IRIs that were invalid according to rfc 3987).
In general, I'd expect a parser (or any other tool that creates RDF-JS terms for me) to return errors when the input does not match the spec. I agree that's difficult in this case without increasing the scope of N3, considering we have several specifications to take into account: the turtle family, IRI spec, XSD datatypes, and possibly other datatypes. |
Great! Do let me know if there's any issues; I definitely want to support your use case (even if in a slightly different way).
Do let me know; the Turtle grammar is strict as to what URLs are allowed, and N3.js follows that.
Agreed; we might just differ in what we consider "the spec". For me, that's RDF 1.1 and Turtle.
Well, we don't—I just think that this one particular way of validating clashes with performance goals, and that there is a better way of implementing it that offers better performance and other benefits.
And that's where we differ as well. RDF/JS indeed has a
No need to wrap; just chain a validating stream after it. It's literally So far, I have not heard drawbacks to implementing validation after the parser—in fact, I think there are only advantages. |
If a custom datafactory throws an error then the StreamParser does not gracefully emit this error.
A unit-test to reproduce (failed to find out how to style my code to satisfy the linter, so copy/pasting it here ;) ):
Usecase: custom validation that happens in a datafactory
The text was updated successfully, but these errors were encountered: