Optimise dumping to reduce unnecessary overhead #1649

dsimidzija · 2020-08-16T22:30:31Z

NB: Please take this MR more as a "starting a discussion" than "production-ready" code, because I hacked a lot of things.

When dumping many objects, marshmallow is calling the same field methods
over and over again, which return the same values. Parts of this process
can be called only once per dump, which reduces python method call
overhead significantly.

Field.get_serializer returns the optimized serializer for the current
dump operation, avoiding the expensive lookups for properties which will
not change during a single dump (such as data_key, default, etc)

Also, the default Schema.get_attribute is also not used because all it
does is calling utils._get_value_for_key(s).

Benchmarks show around 30-35% improvement, which is quite significant even for this hacky patch:

T1: python benchmark.py
  Before: 395.60 usec/dump
  After:  261.04 usec/dump
T2: python benchmark.py --object-count 1000
  Before: 22508.80 usec/dump
  After:  14610.63 usec/dump
T3: python benchmark.py --iterations=5 --repeat=5 --object-count 20000
  Before: 442295.61 usec/dump
  After:  288202.98 usec/dump
T3: python benchmark.py --iterations=10 --repeat=10 --object-count 10000
  Before: 220163.94 usec/dump
  After:  142475.76 usec/dump

My motivation here is that marshmallow is excellent when it comes to schema validation, but according to the benchmarks, there is a lot of overhead in there. The question is, is there a better way of improving serialization performance without sacrificing all the good things about marshmallow?

src/marshmallow/schema.py

deckar01 · 2020-08-18T15:58:18Z

This lines up pretty well with the discussion in #805. I think the main difference is that we also proposed caching logic that depends on the object type. I think the scope of this PR would be a good incremental improvement.

dsimidzija · 2020-08-19T00:29:35Z

I didn't even know about #805, good to know! My ideas were something similar: figure out a way to cache values for certain types at least. For example, if I know that a specific field is a string property and it is 100% there, there should be no reason to have complicated lookups or enforcing types. I didn't have time to delve deeper into the various Field subclasses, but line_profiler implies a lot of repetition there.

As I'm typing this, I wonder if it would be possible to do something like lima is doing, and compile a single function which dumps the entire object, where each Field subclass could in theory produce the "most optimized" serializer for itself.

dsimidzija · 2021-03-16T19:39:21Z

I don't know if this is of any interest, but with some minimal changes, marshmallow can be cythonized with setuptools-cythonize and together with this MR, the performance is around ~45% better!

lafrech · 2021-03-16T19:56:07Z

I'm not familiar with cythonize. Does cythonization itself - without this PR - bring a significant improvement?

Is there anything we could do to make it available more easily? Like, could/should we distribute binary packages? Would it be useful to many users or is it a niche?

dsimidzija · 2021-03-16T20:48:12Z

I tested it without this MR a long time ago, so I don't have the numbers, but IIRC it did bring around 10% of speed on its own. But I guess that should be examined in more detail.

I'm honestly not sure how niche it is, I started looking into it because I ran into these performance problems when dumping large(ish) datasets with marshmallow, but didn't want to give up dynamic schemas & validation that it provides. I haven't had the time yet, but I was hoping to look into speeding up individual fields as well, I feel like there is a lot of overhead there which should be easy to eliminate without breaking anything.

When dumping many objects, marshmallow is calling the same field methods over and over again, which return the same values. Parts of this process can be called only once per dump, which reduces python method call overhead significantly. `Field.get_serializer` returns the optimized serializer for the current dump operation, avoiding the expensive lookups for properties which will not change during a single dump (such as `data_key`, `default`, etc) Also, the default `Schema.get_attribute` is also not used because all it does is calling `utils._get_value_for_key(s)`.

deckar01 reviewed Aug 18, 2020

View reviewed changes

src/marshmallow/schema.py Show resolved Hide resolved

dsimidzija force-pushed the dump-performance-tweaks branch from 4f04fbe to 595b12b Compare September 25, 2020 15:45

dsimidzija force-pushed the dump-performance-tweaks branch from 595b12b to ee835bf Compare March 7, 2021 21:07

dsimidzija force-pushed the dump-performance-tweaks branch from ee835bf to df37ea1 Compare November 21, 2021 14:45

dsimidzija force-pushed the dump-performance-tweaks branch from df37ea1 to 622d8d0 Compare November 21, 2021 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise dumping to reduce unnecessary overhead #1649

Optimise dumping to reduce unnecessary overhead #1649

dsimidzija commented Aug 16, 2020

deckar01 commented Aug 18, 2020

dsimidzija commented Aug 19, 2020

dsimidzija commented Mar 16, 2021

lafrech commented Mar 16, 2021 •

edited

dsimidzija commented Mar 16, 2021

Optimise dumping to reduce unnecessary overhead #1649

Are you sure you want to change the base?

Optimise dumping to reduce unnecessary overhead #1649

Conversation

dsimidzija commented Aug 16, 2020

deckar01 commented Aug 18, 2020

dsimidzija commented Aug 19, 2020

dsimidzija commented Mar 16, 2021

lafrech commented Mar 16, 2021 • edited

dsimidzija commented Mar 16, 2021

lafrech commented Mar 16, 2021 •

edited