Pydantic v2 significantly slower than v1 #6748

amit-o · 2023-07-19T10:11:39Z

amit-o
Jul 19, 2023

I can't post the code as it's internal, but I'll try my best to describe and provide what information I can.
We have a heavily nested, big object. Some of the child objects have some custom validators, and I believe also all if not all custom objects in the model inherit from BaseModel.
After working through all the breaking changes in v2, running the same code gives me a significantly higher runtime than with v1 - around two times slower, depending on what exactly I do.
From trying some things it looks like the imports are slower, as I'm running a (typer based) CLI which takes around 0.8 seconds for --help in v1 and 1.9s in v2. It doesn't initialize the giant object, but does import it.
I'm not an expert on flamegraphs but from what I can see v2 has significantly more calls.
v1

v2

I'm trying to figure out how to best approach this to further test what exactly takes longer. Since I haven't done any significant changes to the code I'd expect it to be at the very least on par with v1.

samuelcolvin · 2023-07-19T10:27:13Z

samuelcolvin
Jul 19, 2023
Maintainer

I'm afraid there's not really anything we can tell from that flamegraph.

Constructing models can be slower in some cases with Pydantic V2, it's validation that should be much faster with V2.

However constructing models shouldn't take 0.8s unless there are a very large number.

Without code to review, I can't really be more helpful than that.

some guesses at things that could make building models slow (only guesses since I can't see the code!):

generic recursive models can be slow to build
each subclass of BaseModel constructs the validator and serializer when the class is created, even if the model is only used within another model (where the models own validator and serializer art never used), you could save this time by using vanilla dataclasses or TypedDicts for any nested model - this will mean unnecessary validators and serializers are never used - we might be able to add a config switch to models to not build their internals when the developer knows they're only being used as nested models within other schemas

Other than that, hard to say - if you can share the flamegraph but not the code, please share the full HTML flamegraph so we can read the method names etc.

0 replies

amit-o · 2023-07-19T10:40:20Z

amit-o
Jul 19, 2023
Author

Thanks for the quick reply!
Regarding the flamegraph, I can save it locally and see it properly - is that not working? Is there anywhere else I can upload it easily?
I'll check about the second point; The nested objects themselves have custom validators so I'm not sure if this is completely feasible but would the other way (Changing parent model to a dataclass) help perhaps? Worth testing

0 replies

samuelcolvin · 2023-07-20T13:13:58Z

samuelcolvin
Jul 20, 2023
Maintainer

might be related to #6768.

0 replies

amit-o · 2023-07-23T10:59:17Z

amit-o
Jul 23, 2023
Author

Thanks @samuelcolvin , definitely looks related. I also noted the high _walk count
I didn't even think about the FastAPI portion (code runs as both a CLI and an API). Removing the API imports improves runtime by 0.4~, which is disproportionate to it's portion in the code base. Still slower, but less extreme.

0 replies

samuelcolvin · 2023-07-24T12:34:15Z

samuelcolvin
Jul 24, 2023
Maintainer

#6823 might help you.

0 replies

orgua · 2023-08-19T18:51:47Z

orgua
Aug 19, 2023

I have the same problem on armv7 (where it's slowdown is relevant and noticeable) and could provide a code-base that consists mostly of nested pydantic-models. I've read the issue #6768 with fastapi and just wanted to provide an example without it.

Our datalib-startup slowed down >60% just by switching to pydantic v2. defer_build has a 1.5 % advantage, but triggers AttributeError(object has no attribute '__pydantic_serializer__) in our testbench (x64 py3.11).

used software:

python 3.10.12
pydantic 2.2.1
pydantic_core 2.6.1

How the results were obtained:

sudo python3 -X importtime -c 'from shepherd_core.data_models.task import EmulationTask' 2> importtime.log

8.4 s on v2023.8.6, pydantic 1.10
13.9 s on v2023.8.7, pydantic 2.2.1
13.7 s with defer_build=True

0 replies

criemen · 2023-10-30T00:03:34Z

criemen
Oct 30, 2023

I am a user of githubkit, which uses pydantic under the hood.
It creates a lot of models for the GitHub REST API, and loading that file already was not fast for pydantic V1, but on production servers we're talking 20+ seconds for model building. You can find it here. Maybe you can use that for benchmarking?

It's gotten so slow (and memory hungry!) with pydantic v2 that we can't update, as we're running githubkit on airflow, which for every task executes a new python process, having a 20+sec startup overhead (for a 5sec task) is unacceptable.

Is there some way to cache the built models, so they don't have to be re-parsed? As those models only change on githubkit version upgrades, I'd ideally like to skip almost all the CPU usage that's done, and just load the parsed information from disk.

0 replies

nikolaymatrosov · 2024-01-12T09:30:57Z

nikolaymatrosov
Jan 12, 2024

I've been conducting research on the initial startup duration of a simple Telegram echo-bot created with aiogram. I observed that it requires approximately 2200ms to initiate. This is notably longer compared to a functionally equivalent bot developed using pyTelegramBotAPI, which starts in just about 100ms.

Upon further analysis, I discovered that the primary factor contributing to this extended startup time in the aiogram version is Pydantic. The models used in aiogram are quite extensive and their initialization seems to be a time-consuming process.

Cold starts in serverless chatbots cause significant response delays, adversely affecting user experience and scalability. Minimizing these start-up times is crucial for maintaining efficient, cost-effective, and responsive bot interactions.

5 replies

samuelcolvin Jan 12, 2024
Maintainer

Yup, we know this is a problem, I'm sorry about it.

I'll try to allot some time to investigating potential solutions. We'll comment here when we have some progress.

One thing you could try @nikolaymatrosov is setting defer_build=True on any models that are only used nested within other models. I don't expect it to be transformative, but might help and the results would be useful for us.

nikolaymatrosov Jan 12, 2024

Thank you for your suggestion. But it seems the proposed setting is already enabled. https://github.com/aiogram/aiogram/blob/dev-3.x/aiogram/types/base.py#L17

michaelgmiller1 Feb 14, 2024

@samuelcolvin are there any promising performance fixes in the short term? our codebase has many nested and recursive models and v2 is much slower. Our unit tests take ~3 minutes to run on v1, and ~11 minutes to run on v2. And the startup time for a single test is up to 40 seconds.

samuelcolvin Feb 16, 2024
Maintainer

Yes there are, a few things in 2.7 should improve this.

Why are you creating models on every test? Sounds like there might be someone up with your test setup.

michaelgmiller1 Feb 16, 2024

That's great to hear. I'm excited for the PYO3 usage, hoping that yields big performance wins.

We're using FastAPI, and the act of initializing our application alone creates all of our models. Even if we defer builds, we end up initializing all of our models due to recursive nesting (model A has a field referencing model B, which references back to model A, etc.). We have a file with a list of update_forward_refs (now model_rebuild in V2) to solve this.

Admittedly we can improve on the model design, but this is what we're stuck with for the time being.

earonesty · 2024-02-05T19:50:56Z

earonesty
Feb 5, 2024

in my experience with complex jscon schema validation, i've always found that a "precompiling" or "codegen" step can often help. perform all the load-time class precreation once, and then take the compiled .py file containing "flat" classes that can validate, etc. ie: an optional precompiler could vastly speed up large, complex codebases. if someone has an example of a large, complex set of pydantic models that take a long time to load, i could try my hand at writing a pydantic compiler

1 reply

dtatarkin Feb 16, 2024

if someone has an example of a large, complex set of pydantic models that take a long time to load, i could try my hand at writing a pydantic compiler

@earonesty

You can use the aiotdlib library as an example. It contains hundreds of generated Pydantic models and has recently been migrated to Pydantic2, so it faces the problem of a long startup time.

Here are the results of tests on my machine:

Pydantic2 took 14 sec to startup

pip install git+https://github.com/pylakey/aiotdlib.git

time python -c "import aiotdlib"

real	0m14.066s
user	0m12.398s
sys	0m1.611s

Pydantic1 version took 2.5 sec to startup

pip install "aiotdlib==0.21.0" "pydantic<2"

time python -c "import aiotdlib"

real	0m2.541s
user	0m2.450s
sys	0m0.091s

KeeganOP · 2024-03-11T09:49:44Z

KeeganOP
Mar 11, 2024

Noticing this on my side as well. MVP below. Just importing FastAPI

import time
import cProfile, pstats, io
from pstats import SortKey


def test_importing():
    pr = cProfile.Profile()
    pr.enable()


    current_timep = time.time()

    import fastapi

    end_time = time.time()


    pr.disable()
    s = io.StringIO()
    sortby = SortKey.CUMULATIVE
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    ps.print_stats()
    print(s.getvalue())

    print(f"Time taken: {end_time - current_timep}")


test_importing()

Testing with fastapi==0.109.2 and pydantic==1.10.14 :

         336969 function calls (323651 primitive calls) in 0.486 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    266/1    0.003    0.000    0.437    0.437 <frozen importlib._bootstrap>:1165(_find_and_load)
    266/1    0.001    0.000    0.437    0.437 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked)
    237/1    0.001    0.000    0.436    0.436 <frozen importlib._bootstrap>:666(_load_unlocked)
    184/1    0.001    0.000    0.436    0.436 <frozen importlib._bootstrap_external>:934(exec_module)
    572/2    0.000    0.000    0.436    0.218 <frozen importlib._bootstrap>:233(_call_with_frames_removed)
    245/1    0.006    0.000    0.436    0.436 {built-in method builtins.exec}
<continued>...

Time taken: 0.48537635803222656

Testing with fastapi==0.109.2 and pydantic==2.6.3:

         1346265 function calls (1187736 primitive calls) in 1.661 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   3873/1    0.009    0.000    1.665    1.665 <frozen importlib._bootstrap>:1165(_find_and_load)
    284/1    0.002    0.000    1.665    1.665 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked)
    273/1    0.002    0.000    1.664    1.664 <frozen importlib._bootstrap>:666(_load_unlocked)
    241/1    0.001    0.000    1.664    1.664 <frozen importlib._bootstrap_external>:934(exec_module)
    621/2    0.001    0.000    1.663    0.832 <frozen importlib._bootstrap>:233(_call_with_frames_removed)
    492/1    0.030    0.000    1.663    1.663 {built-in method builtins.exec}
<continued>...

Time taken: 1.6638431549072266

The function call count skyrockets. Looking at the times. pydantic dominates when its on v2 whereas on v1 pydantic doesn't even appear on the list?

0 replies

michaelgmiller1 · 2024-05-14T00:54:53Z

michaelgmiller1
May 14, 2024

Is there a plan to increase v2 performance here? It is still an order of magnitude slower than v1 from us, preventing us from upgrading.

0 replies

sydney-runkle · 2024-05-31T14:37:00Z

sydney-runkle
May 31, 2024
Collaborator

Indeed, we definitely want to improve the import time and model build time in v2. This will increasingly be a priority for us.

I'll work on some concrete plans for improvements in the coming weeks and update our roadmap!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydantic v2 significantly slower than v1 #6748

{{title}}

Replies: 12 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Pydantic v2 significantly slower than v1 #6748

Replies: 12 comments · 6 replies

samuelcolvin Jul 19, 2023 Maintainer

amit-o Jul 19, 2023 Author

samuelcolvin Jul 20, 2023 Maintainer

amit-o Jul 23, 2023 Author

samuelcolvin Jul 24, 2023 Maintainer

samuelcolvin Jan 12, 2024 Maintainer

samuelcolvin Feb 16, 2024 Maintainer

sydney-runkle May 31, 2024 Collaborator

Replies: 12 comments 6 replies

samuelcolvin
Jul 19, 2023
Maintainer

amit-o
Jul 19, 2023
Author

samuelcolvin
Jul 20, 2023
Maintainer

amit-o
Jul 23, 2023
Author

samuelcolvin
Jul 24, 2023
Maintainer

samuelcolvin Jan 12, 2024
Maintainer

samuelcolvin Feb 16, 2024
Maintainer

sydney-runkle
May 31, 2024
Collaborator