Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-Schema comparison #254

Open
nicolaiarocci opened this issue Aug 24, 2016 · 26 comments
Open

JSON-Schema comparison #254

nicolaiarocci opened this issue Aug 24, 2016 · 26 comments

Comments

@nicolaiarocci
Copy link
Member

nicolaiarocci commented Aug 24, 2016

It would be nice to add a section to the docs with the main differences between Cerberus and jsonschema, possibly including some use cases where one is a better tool than the other.

I get this asked a lot and unfortunately I do not have a good answer as I don't know jsonschema well enough. When I started work on Ceberus back a few years ago, jsonschema was not out yet or I did not get a hold of it (I suspect the two projects have about the same age).

So, this is up to contributors who know better than me.

@funkyfuture
Copy link
Member

funkyfuture commented Sep 8, 2016

phew, i remember when i was looking for a validation library, i had to choose between these two and jsonschema seemed too unpythonic.

just had a quick look at the docs. still the same. certainly a good choice when you have to deal with legacy schemas. how to extend these validators seems to be a well kept secret.

@rredkovich
Copy link
Contributor

rredkovich commented Sep 8, 2016

Had an experience just few weeks ago with Connexion which provides swagger to flask code skeleton generation and uses jsonschema as validation library.

Faced two things that could be beter:

  1. Partial error messages, e.g. in case of 5 invalid fields report will include only one. After fixing invalid field it will include second, and so on.
  2. In some cases error message are not good for debugging, like "NoneType is not of type str"

@txomon
Copy link

txomon commented Nov 11, 2016

This is not a real comparison, but just feedback from my work on this. I created a cerberus to jsonschema conversor for swagger, to have some serializers we internally use be autodocumented with swagger.

I found out some differences that look interesting to this thread.

Observations

Their object definition includes an interesting field called definitions, therefore making the schema self contained and serializable, because all entities in jsonschema live inside the schema, which acts as a namespace, they can have references (instead of circular dependencies). We don't have such need because we can just include one object on itself.

They not only validate, but also document objects, their schemas have titles, descriptions etc.

Big difference that gave me a lot of headaches (and I even thought it was a bug OAI/OpenAPI-Specification#794) is that a field's definition to know if it's required or not is on the parent and not on itself. At first instance it looked like a horrible design decision, but after thinking it for a while, it looks like a good design decision.

They don't support nullable or empty.

Personal opinion

The fact is that most of the world is forcefully using jsonschema.

I think we should be jsonschema compatible, because Cerberus has by far way better feedback messages, and some more features (nullable/empty for example, coercion, defaults, etc.)

Adapt the schema

The incompatibilities with jsonschema:

  • Schema self contained and serializable
  • Required is on the parent

I would like to change how our schema works to make it have the same structure as json schema, this is, the object definition to be an object definition on itself, not directly the childs of an object.

This would allow us to have the definitions part, and have references. This may not be too compatible at the moment, but I think this would enable us to easily do the second setp

Create importer/exporter

I really like how cerberus is, and the functionalities it provides, however most of the industry is using jsonschema. Because cerberus is almost a superset of the functionalities, once the schema has been adapted for the limitations, there shouldn't be any problem to import/export jsonschema to/from cerberus.

@funkyfuture
Copy link
Member

interesting. are you aware of the schema registry? with this a conversion should be possible, shouldn't it?

@nicolaiarocci would you accept a converter in the schema module?

@nicolaiarocci
Copy link
Member Author

@funkyfuture absolutely.
@txomon thanks for sharing your experience.

@txomon
Copy link

txomon commented Nov 12, 2016

@funkyfuture I wasn't aware of that one!

So, the constraints at the moment are:

  • We wouldn't be able to serialize if there is circular dependency using objects (it's ok if using registries).
  • We would have to inject all the definitions in the root object.

The only thing to adapt then is the nullable, empty, required, etc. attributes. How can we reuse a definition if it defines on itself those characteristics?

We should either be capable of overriding from the reference, or change how cerberus schema definitions work and have it defined on the parent (quite breaking change IMO). Any ideas?

@nicolaiarocci I will try to get the work I did at @Ridee into a PR

@funkyfuture
Copy link
Member

funkyfuture commented Nov 12, 2016

We wouldn't be able to serialize if there is circular dependency using objects

the 'support' for this in cerberus is gone anyway, afaik.

The only thing to adapt then is the nullable, empty, required, etc. attributes. How can we reuse a definition if it defines on itself those characteristics?

i'm not sure whether i'm getting you right here. but a feature that isn't supported in jsonschema should raise an error when converting to it.

We should either be capable of overriding from the reference, or change how cerberus schema definitions work and have it defined on the parent (quite breaking change IMO). Any ideas?

a conversion may be done from/to a cerberus validator rather than a cerberus schema as you can bind a schema registry to a validator. would that have any downside? edit: in that case i'm not sure that the converter should live in the schema module.

the implementation should solely rely on the stdlib's json module or the feature should only be available if jsonschema is available in the environment. platforms that lack a needed json feature in the stdlib don't need to be supported, imo.

sidenote: if this really achieves 100% compatibility, cerberus could also run against jsonschema reference tests.

oh, this is going off-topic.

@txomon
Copy link

txomon commented Nov 12, 2016

the 'support' for this in cerberus is gone anyway, afaik.

Sure, but if someone happens to still have it, it would cause an infinite recursion problem. if you say it's gone, then it's safe to do so...

i'm not sure whether i'm getting you right here. but a feature that isn't supported in jsonschema should raise an error when converting to it.

We can extend it, extensions are defined in jsonschema already, http://json-schema.org/latest/json-schema-core.html#rfc.section.5.4

a conversion may be done from/to a cerberus validator rather than a cerberus schema as you can bind a schema registry to a validator. would that have any downside?

Yeah, because I had to code it externally (without modifying cerberus) my conversor works on the schema directly, but doing it on the validator would be the appropriate thing to do indeed.

Also, the original question on the limitation on these attributes being defined in the child rather than in the parent remains... :/

@t2y
Copy link

t2y commented Feb 27, 2017

I'm just looking about what the difference between cerberus and jsonschema is. And then, I found this issue. Adding some documentation into cerberus for this is helpful for a new user who investigates the technical detail.

@pavel-shpilev
Copy link

I found this when was looking for comprising between voluptuous (https://github.com/alecthomas/voluptuous) and cerberus. I have only used voluptuous but from brief docs review, cerberus looks pretty similar. Has anybody tried both? A comparison between multiple tools would be great.

@funkyfuture
Copy link
Member

another one popped up: https://github.com/gaojiuli/xdata - seems similiar to voluptuous

and one targeting jsonschema: https://github.com/guyskk/validr

we could collaborate on a feature comparison matrix with a spreadsheet on https://ethercalc.net/

@funkyfuture
Copy link
Member

i don't know when i will continue with the comparison matrix i started, feel free to check and amend: https://ethercalc.org/y41wgbonovm1

@txomon
Copy link

txomon commented Apr 24, 2017

@funkyfuture could you explain a little what those rows mean? I find them not enough explanatory... Specifically the blocking / non-blocking part.

Cheers!

@funkyfuture
Copy link
Member

@txomon i amended a legend.

@mitar
Copy link

mitar commented Nov 29, 2017

They don't support nullable or empty.

JSON schema has null type: https://spacetelescope.github.io/understanding-json-schema/reference/null.html

So one can say "type": ["string", "null"] and this means that a value can be a string or a null.

Empty can be defined depending on a type. String with max length zero. An object with no properties and no extra properties allowed.

@CJ-Wright
Copy link

A description field would be nice.

@funkyfuture
Copy link
Member

funkyfuture commented Jan 25, 2018

here's another lib, inspired from Javaland: https://github.com/Grokzen/pykwalify

@tyomo4ka
Copy link

tyomo4ka commented Jul 2, 2018

I had to make a choice recently between JSON Schema and Cerberus.

Picked Cerberus. Here is my considerations:

  1. Super easy to extend with your custom rules.

  2. It offers normalisations rules: http://docs.python-cerberus.org/en/stable/normalization-rules.html. I didn't find how to implement something like this in JSONSchema.

  3. Cerberus to Open API conversion could be easily done. I spent less than a day on writing converter, it's specific to my use case, didn't have time to package for open source, but I will try later.

  4. Description and any other "meta" fields can be easily done with this trick:

    def _validate_description(self, description):
        """
        Allows a description field on the structure

        The rule's arguments are validated against this schema:
        {'type': 'string'}
        """
        pass
  1. I had to deal with many isolated schemas rather than with a one big schema, where "definitions" feature would be useful.

@primordialstew
Copy link

Thank you for this thread! It raises a question I immediately had when I came across Cerberus, which was: "This tool looks awesome! I wonder why it is 'rolling its own' schema specification, instead of using JSON-Schema?"

Folks are raising a lot of interesting points and questions, but there seems to be some conflation/comparison of tools with importantly different scopes and responsibilities.

Maybe it would be helpful to clarify some of these scopes. My understanding is as follows:

  1. JSON-Schema is a schema specification, not an implementation.
  2. jsonschema is one of several libraries that attempt to implement JSON-Schema
  3. Cerberus is also a validation library, but it defines/implements its own schema specification.

So Cerberus defines its own schema specification. For the sake of this discussion, let's give it an arbitrary name, Cerb-Schema. Comparing Cerberus to jsonschema is an apples-to-apples comparison; both are validation libraries. Comparing Cerb-Schema to JSON-Schema is an apples-to-apples comparison; both are schema specifications. Comparing Cerberus to JSON-Schema is an apples-to-oranges comparison; Cerberus is a library, JSON-Schema is a spec.

Okay, with luck that is clear and not too controversial?

If so, then I think a key question is: can Cerberus be adapted to support an alternate schema specification, namely JSON-Schema. That would mean that projects with existing data definitions written to the JSON-Schema spec could use Cerberus as a validation library without having to create a duplicate definition in "Cerb-Schema". This is a different question than the OP, which is "can someone create a section in the documentation that compares the Cerberus and jsonschema validation libraries", but addresses some of the discussion that emerged in the thread. Perhaps it would be valuable to create a new thread: "Add support for JSON-Schema schema definitions"?

@primordialstew
Copy link

P.S. there is some mention of Swagger/OpenAPI. This is a really interesting and useful tool, but it's scope is to provide a specification and a reference implementation for defining RESTful web applications. It constitutes a particular use-case for schema specifications, data definitions and validation, but is neither an apples-to-apples comparison to validation libraries (Cerberus, jsonschema) nor schema specifications ("Cerb-Schema", JSON-Schema).

@CMCDragonkai
Copy link

Would be nice to be interop with JSON schema. This would make the same schema portable between different environments that perform validation.

@jim-bo
Copy link

jim-bo commented Apr 25, 2019

I've worked with cerberus for a number of years but have recently been required to use json-schema to define the data model (for increase compatibility with other projects). The python validator of json-schema is pretty basic, a port of the cerberus validator logic (including things like coercion and nullable) to support json-schema would be grand.

@ssbarnea
Copy link

ssbarnea commented Feb 2, 2020

Any updates on this? I am really interested about exporting a JSONSchema definition our of cerberus one because that format became kinda standard with hundreads of adopters and tools already using the schemastore.

@Jerry-Ma
Copy link

Jerry-Ma commented May 27, 2020

I came across this thread and it seems that no one has mentioned this: https://github.com/keleshev/schema . This package provides a very pythonic way of doing schema validation, and it provides a converter to jsonschema. I was wondering what are the differences between Cerberus and this schema package.

@macks22
Copy link

macks22 commented Jan 14, 2021

Another package that seems quite relevant to this conversation is pydantic.

It provides what I perceive to be an extremely Pythonic validator for JSON Schema and object mapping between JSON Schema and Python BaseModel classes. These are quite similar in principle to dataclasses and the package actually provides a simple way to convert existing dataclasses to its object model. It also provides a simple interface for defining custom validation logic in Python and a variety of integrations with other tools, e.g. mypy, PyCharm IDE, ORMs.

So +1 to the idea of being able to go back and forth between Cerberus schemas and JSON Schema, since that would then make it possible to easily convert from Cerberus to pydantic and vice-versa.

One other note: one of the big design principles in pydantic appears to be an emphasis on speed; thus it provides a benchmark comparison to other validators, including cerberus. This is only a comparison in terms of speed though; other considerations like usability, comprehensiveness of features, dev team responsiveness and project longevity, etc. are not addressed there.

@ssbarnea
Copy link

@macks22 That came a very good time as I need to build a simple JSON Schema to enforce a file format but I still have PTSD from one year ago when I did some work on it. The JSON schema format seems to be anything but human and gives me the impression that only the spec authors are able to write schemas in it.

If you can give some hints on how to produce a JSON schema from anything else (preferably python), it would be awesome. I still need to produce JSON schema because that is what is the official sharable format, and supported by many tools and editors (vscode). The best it would be if I could use python 3.6 typing support to define the data types.

@funkyfuture funkyfuture pinned this issue Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests