Add priority to decorator processors #600

lgo · 2017-03-12T09:56:09Z

In the documentation, it's quite explicit about there being no prioritization of processors. I believe this would be a great addition for a few reasons. It allows more complex post-processing flows.

For example, the case I need priority is where an abstract schema will post-process into a model object, but before that must be some specialized post-processing in an implementation of the schema. This is less than ideal to implement using the suggested pattern due to the post-processor belonging to the abstract class.

This will retain backward compatibility, given that every processor will default to the same priority and the disclaimer in the documentation.

I'm open to suggestions for an alternative interface for this. Defaulting all processors to 0 would require other processors to run before them to have negative values, so likely a non-zero priority is more appropriate.

ps. love marshmallow, and all of the work that y'all do for it! The documentation is among the best that I've experienced, and every time I read through I discover a new solution to an anti-pattern I've had :).

sloria · 2017-03-14T02:49:44Z

Thanks for the PR and the kind words, @xLegoz . Looks potentially useful and the implementation is straightforward. It seems, however, that it would be straightforward to meet your use-case using a BaseSchema that exposes hooks, like so:

from marshmallow import Schema, post_load

class BaseSchema(Schema):
    @post_load
    def post_load(self, data):
        postprocessed = self._postprocess(data)
        return self._make_object(postprocessed)

    def _postprocess(self, data):
        return data

    def _make_object(self, data):
        return data

Would this meet your use case?

lgo · 2017-03-14T07:27:24Z

Something like that does work out, but with quite a bit of hacking for the project I have, and we've got a lot (probably too many) post-processors we'd need to change for it. At the moment, this has been the best solution and the project is using the forked version so it's not a big deal :).

sloria · 2017-03-19T19:55:14Z

Sounds good. I'm not opposed to merging this, but I'd like to take some time to collect my thoughts and get feedback on this.

For one, I think it might make sense for higher-priority methods to execute first, i.e. execute in descending priority order. This seems more intuitive, and also addresses the issue of having 0 as the default priority, as you mentioned in your original post.

douglas-treadwell · 2017-06-09T22:12:30Z

Interesting idea, @xLegoz!

I agree that it would be more intuitive for higher priority methods to execute first. This just makes sense in English. "High priority" is more urgent than "low priority".

However, I lean toward @sloria's view that the same behavior could be accomplished by writing one @post_load handler that calls several other functions. Where possible, this would be easier to follow than scattered methods executed in order of priority decorators.

You could even define your own @post_load_priority(priority) decorator that would register any number of methods on the Schema as (for example) self._post_load_methods. Your @post_load function could then call all those methods in sorted order, without needing to know all the methods when the @post_load method is defined. Subclasses could define their own methods and specify their own post_load_priority.

Another thought is, is there any way in your design to provide named hooks rather than numeric priorities? If you have a situation where subclasses are inserting processing steps, I wonder if it'd be easier to read and maintain if you define specific steps that subclasses could hook into (which may be what @sloria meant).

embray · 2017-12-03T17:47:55Z

This is useful, however, when subclassing other schemas that may already have some processors defined on it (say, some post_load). There are some ugly hacks to get around that, but it would be nice if, given the priorities of existing processors (assuming their priority levels are reasonably well spaced out) that a subclass can add additional processors in the same category in a reasonable order with respect to existing processors.

An alternative to this might be to add something like before= or after= arguments to a processor tag, where an existing processor is given. Then, rather than relying on admittedly shaky priority-levels one could explicitly order processors with respect to each other.

lgo · 2017-12-05T02:04:17Z

@embray I do agree with your reasoning for the alternative approach with before= and after=. As it's already been mentioned, defining an ordering with numbers is not very intuitive.

There is some ambiguity about how you might specify the functions, so there definitely needs to be some thought there for the most approachable interface. e.g. a few different ways the interface could work could be:

@post_processor(before=ThisClass.processor_that_runs_after)
@post_processor(before="processor_that_runs_after")

(Of course, if this could functionally be provided as a library that would also be pretty wicked for an experimental interface!)

embray · 2017-12-11T16:51:25Z

Yeah, I might give that a shot. There are definitely areas where one can run into trouble with this (e.g. cycles). Normally that wouldn't happen but it would probably be good to check.

vgavro · 2018-02-10T19:11:17Z

Greetings. I really think that latter method is too complicated for such task. Pros - it's more explicit. Cons - more work to support and implement (cycles that mentioned by @embray, for example), if processor you're depending on is renamed or not exist you should throw an error? And what if processor is applied dynamically on schema instance creation? You need priorities (or order) only for schema sub-classing and mixins. It's really not such a big job to determine integer for it (and you need it in your own code, in most usecases). But on using "after/before" it would not be obvious when two processors without "after/before" is executed (at least I can't get straight picture right now). So in the end - we would get more explicit(+) but more complicated(-) behavior. About third-party libraries with several processors - in such rare cases - they may use "order" with step 10 to allow user-code insert their processors when needed (for example popper.js use step 50 to allow user-code insert "modifiers"). So my vote - to merge this pull request as is (maybe rename "priority" to "order"). Thank you guys for all your work!

lafrech · 2018-11-22T13:36:30Z

The more we answer various use cases with

this is a job for [pre|post]_[dump|load] decorators

the more this is needed.

I'll mark it as 3.0. Feel free to postpone if you think it is not so crucial.

I also agree that using integers has a theoretical limitation that is not a real issue in practice. Python logging module also uses integer logging levels, and defines default levels only for tens, leaving a lot enough space to define custom levels.

At least, integer priorities would be better than any twisted workaround to enforce the order of validators from different code layers or even different libraries.

lafrech · 2018-11-22T13:37:52Z

marshmallow/schema.py

+
+    def _get_priority(self, attr_name, tag):
+        processor = getattr(self, attr_name)
+        return processor.__marshmallow_kwargs__[tag].get('priority', 0)


Do you need that default value for priority?

Since a default is set above, this should work:

return processor.__marshmallow_kwargs__[tag]['priority']

lafrech · 2018-12-11T17:08:43Z

Setting 3.0 milestone on this so as not to forget it as it seems feasible. Guys, feel free to postpone.

deckar01 · 2018-12-11T19:11:07Z

Try overriding the abstract hook with the specialized hook and explicitly declare the dependency order by choosing where to call super.

from marshmallow import Schema, fields, post_load


class BaseModel:
    def __init__(self, data):
        self.data = data

class Base(Schema):
    id = fields.Str()
    
    @post_load
    def make_model(self, data):
        return BaseModel(data)

class Special(Base):
    name = fields.Str()
    
    @post_load
    def make_model(self, data):
        data['name'] = data['name'].title()
        return super().make_model(data)


result = Special().load(dict(id='1', name='test'))
print(result, result.data)

Note: I couldn't actually reproduce the reported issue when building this minimal example. It just demonstrates that inherited hooks can can control the execution order of their base class's hooks, because the base hook is only called once.

lafrech · 2018-12-11T19:20:21Z

This does not solve the case of having a make_model hook in base class and an unrelated post_load hook in a child class. Well, it does if you ensure to name the unrelated hook make_model, but maybe you don't want to do that.

Also, priorities would allow marshmallow-based framework to use hook while application code could use hooks as well. Application would just have to watch the priority numbers in the framework. Hence the 10-step idea described above by @vgavro.

If most people are satisfied with current implementation (where base class needs to define hooks for the application and children classes override them), I won't struggle for this feature to be integrated.

deckar01 · 2018-12-11T19:28:07Z

Python has a builtin way to solve method resolution order in inherited classes. Naming the methods the same thing has the benefit of explicitly declaring the dependency relationship. This is much more explicit than a number. You can keep the special method name, just don't add the decorator to it and you can call it from inside the overridden hook.

class Special(Base):
    name = fields.Str()
    
    def fancy_name(self, data):
        data['name'] = data['name'].title()
        return data

    @post_load
    def make_model(self, data):
        data = self.fancy_name(data)
        return super().make_model(data)

lafrech · 2018-12-16T22:36:40Z

Good point, @deckar01. Pushing this logic even further, the whole decoration mechanism could be removed, the feature being accomplished by hooks with fixed names in the base schema.

class BaseSchema(base.SchemaABC):

    def pre_load(data):
        """Override this to add pre-load logic"""
        return data

The invocation of the hooks would be greatly simplified.

This kind of hook is quite common, simpler than current decoration mechanism, and as you say, it relies on usual Python method resolution order.

Maybe it's not so obvious, due to the many / pass_many combinations, I'd need to investigate a little more, but I like the idea.

It would be a little regression for people using multiple decorated methods with no care about the order (e.g. using two independent post_load methods), as they'd have to merge their processors into a common method, but I think this is acceptable. At least we'd have predictable order.

vgavro · 2018-12-20T09:36:57Z

I agree that using python MRO and decorators at same time is counterintuitive, because you should rely on parent custom method names in that case, which may looks good in one-parent-one-child example, but can be real pain if you're defining complex schemes with mixins.
But I like the whole idea to remove decorators and use MRO instead.
Problem with many/pass_many can be resolved with special methods name: post_load_many, which will just apply self.post_load for every item in collection by default.
And if we agree on this #1034 (comment) (always passing **kwargs with original, partial, etc. instead of requesting arguments in decorator) - we could drop current complex implementation and have clear priority with only disadvantage of changed api, which is expected in new major release anyway.
validates_schema option skip_on_field_errors=False can by achieved by explicitly processing ValidationError exception.
@validate(field_name) decorator should be replaced by validate_field(name, value, field=field_obj) then not to break consistency and have order control! (if name == 'my_field': do_something)
So, this looks like a big change. As I can see it - or we're relying on MRO and drop decorators, or we're adding "priority" to keep current decorators API.
Argument that "you can define custom names and then use one post_load, and if post_load already defined in parents - you should override all of them to keep things under control" just don't looks good to me.

deckar01 · 2018-12-20T14:31:01Z

Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method. It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order. Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case.

This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support.

sloria · 2019-09-11T18:26:32Z

Is there still interest in this? Seems like the OP can be met using the methods above. Unless there are strong feelings about adding this API, I'll plan to close this in a few days.

lafrech · 2019-09-12T07:40:47Z

I still think things would be easier if we had hooks instead of decorators. easier to maintain, easier to understand. Simple Python and inheritance. The code in this lib to support hooks would be very minimal.

In my app, I'm sometimes declaring post_load methods then overriding them in child classes and I'm reaching a point where it would be saner to define hooks in my base Schema using decorators and then don't use the decorators in rest of the code to avoid messing things up.

I don't have the time to draft an implementation right now.

vgavro · 2019-09-18T23:48:22Z

This is response to @deckar01 comment, which happened to be last for a long time, and now I see that it looks like everyone is happy about it, which is not :-)
Also, in last @lafrech comment I believe when he say "hooks" he means MRO, and in my and @deckar01 comment when I say "hooks" I mean current decorator implementation (just to be clear).

[In short - I'm completely support idea to remove decorators in favor of clean MRO, but if it's not to be done - at least add "order" to hooks, because for hooks that modify returned data and data which passes to next hooks - order is crucial]

"Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method."
-- special method name is not the same as "just another method". Again, when you're overriding some parent method in python you don't mark it special way - and seems like python community ok with it.
"It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order."
-- Well, it is, but when "hooks" can return some data, on which other hooks can depend - like pre_load, post_load, pre_dump, post_dump - order is crucial, and this decorators are used mostly for that. So - hooks are cool. for validation which is not changing value. but for input/output changes - hooks are evil.
"Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case."
-- That's not my point. my point is - IF we use decorators (hooks) - we SHOULD allow them order, because it's normal practice and it's crucial even more, because other hooks depends on returned data. If we use MRO - then api must act like MRO, without messing things with MRO/hooks. "There should be one-- and preferably only one --obvious way to do it."
"This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support."
-- Really, I'm also using this pattern not to mess things up..

class BaseModel:
  @post_load()
  def post_load(self, data, **kwargs):
    return data
  # same pre_load, post_load, pre_dump, post_dump

but if we're writing library which is not used in one small project, but have it's own ecosystem around it - i'm expecting one obvious way to interact with schemas in other libraries without pain.

lafrech · 2019-09-19T07:04:32Z

Thanks @vgavro for clarifying what I meant.

ringus1 · 2023-04-12T09:35:49Z

FYI this is already possible by properly alphabetising processors, since resolve_hooks is using dir: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L166

Not sure if it's omitted in the documentation on purpose or initial implementation didn't intend it to work this way ;)

lafrech · 2023-04-12T09:58:52Z

No it wasn't intended, but it's nice to know.

This dates back to when dicts were not ordered, I suppose dir wasn't either.

ringus1 · 2023-04-12T10:07:17Z

Ok. Though dir has always returned list.
https://docs.python.org/2.6/library/functions.html#dir

sloria added needs review feedback welcome labels Mar 19, 2017

Add priority param to processors

8b440c8

lgo force-pushed the dev branch from 55ff363 to 8b440c8 Compare May 25, 2017 23:42

sloria mentioned this pull request Jul 16, 2017

Improvement: Schema tags: added "order" option #654

Closed

lafrech mentioned this pull request Nov 16, 2018

Field: respect single responsability principle, don't access parent object #1046

Open

lafrech reviewed Nov 22, 2018

View reviewed changes

lafrech mentioned this pull request Dec 11, 2018

Order of executing for many validates_schema #1068

Closed

lafrech added this to the 3.0 milestone Dec 11, 2018

sloria removed this from the 3.0 milestone Dec 26, 2018

lafrech mentioned this pull request Oct 31, 2019

post_load decorated functions with same name are not executed #1439

Closed

sloria mentioned this pull request Feb 16, 2021

decorator validates add order parameter #1720

Closed

lafrech mentioned this pull request May 12, 2021

Make customizing Schema behavior with subclasses easier #1812

Open

lafrech mentioned this pull request May 29, 2021

Ordered hook execution #1816

Closed

sirosen mentioned this pull request Jun 30, 2021

Should marshmallow v4 set required names for hooks? #1841

Open

lafrech mentioned this pull request Aug 9, 2022

Marshmallow hooks on schema inheritance #2031

Closed

lafrech mentioned this pull request Mar 27, 2023

nondeterministic behavior when defining custom post_load method in class inheriting SQLAlchemySchema marshmallow-code/marshmallow-sqlalchemy#502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add priority to decorator processors #600

Add priority to decorator processors #600

lgo commented Mar 12, 2017

sloria commented Mar 14, 2017

lgo commented Mar 14, 2017

sloria commented Mar 19, 2017

douglas-treadwell commented Jun 9, 2017 •

edited

embray commented Dec 3, 2017

lgo commented Dec 5, 2017 •

edited

embray commented Dec 11, 2017

vgavro commented Feb 10, 2018

lafrech commented Nov 22, 2018

lafrech Nov 22, 2018

lafrech commented Dec 11, 2018

deckar01 commented Dec 11, 2018

lafrech commented Dec 11, 2018

deckar01 commented Dec 11, 2018 •

edited

lafrech commented Dec 16, 2018

vgavro commented Dec 20, 2018 •

edited

deckar01 commented Dec 20, 2018

sloria commented Sep 11, 2019

lafrech commented Sep 12, 2019

vgavro commented Sep 18, 2019

lafrech commented Sep 19, 2019

ringus1 commented Apr 12, 2023

lafrech commented Apr 12, 2023

ringus1 commented Apr 12, 2023

Add priority to decorator processors #600

Are you sure you want to change the base?

Add priority to decorator processors #600

Conversation

lgo commented Mar 12, 2017

sloria commented Mar 14, 2017

lgo commented Mar 14, 2017

sloria commented Mar 19, 2017

douglas-treadwell commented Jun 9, 2017 • edited

embray commented Dec 3, 2017

lgo commented Dec 5, 2017 • edited

embray commented Dec 11, 2017

vgavro commented Feb 10, 2018

lafrech commented Nov 22, 2018

lafrech Nov 22, 2018

Choose a reason for hiding this comment

lafrech commented Dec 11, 2018

deckar01 commented Dec 11, 2018

lafrech commented Dec 11, 2018

deckar01 commented Dec 11, 2018 • edited

lafrech commented Dec 16, 2018

vgavro commented Dec 20, 2018 • edited

deckar01 commented Dec 20, 2018

sloria commented Sep 11, 2019

lafrech commented Sep 12, 2019

vgavro commented Sep 18, 2019

lafrech commented Sep 19, 2019

ringus1 commented Apr 12, 2023

lafrech commented Apr 12, 2023

ringus1 commented Apr 12, 2023

douglas-treadwell commented Jun 9, 2017 •

edited

lgo commented Dec 5, 2017 •

edited

deckar01 commented Dec 11, 2018 •

edited

vgavro commented Dec 20, 2018 •

edited