Making `AstroidManager` interactable from multiple processes #2048

DanielNoord · 2023-03-09T22:16:20Z

I know, another issue about multiprocessing...

I was reading up on multiprocessing and sharing caches among those processes and it seems as if we need something like multiprocessing.SyncManager to allow accessing a cache between multiple processes.
I think it would be worthwhile to explore whether we can make AstroidManager a SyncManager that needs to be instantiated by whoever needs it rather than having it be a singleton.

However, before spending considerable time exploring this I want to see if others have had tries at this or other ideas about allowing pylint and astroid to "parse modules and nodes in multiple process while keeping a cache between those processes".

Taggin @jacobtylerwalls in particular as they have tinkered with this as well.

The text was updated successfully, but these errors were encountered:

jacobtylerwalls · 2023-03-10T01:49:53Z

If it were me, I'd want to just step through and profile the entire current implementation. I've yet to do that, so I really don't have a forecast at this point. My worry is that we're unnecessarily building a lot of unnecessary astroid asts. If five python modules import pandas, are we building an ast for pandas five times?

jacobtylerwalls · 2023-05-14T05:22:27Z

Thanks for the link. I think with the SyncManager we'll run into trouble with unpickable objects. I did see, this, though:

Better to inherit than pickle/unpickle

Do you mind if I give this a spin?

DanielNoord · 2023-05-14T07:48:50Z

I was able to pickle most objects with cloudpickle.
Not sure if pasting the brain helps that much? Isn't the issue more about having to paste the astroid_cache?

jacobtylerwalls · 2023-05-14T12:50:09Z

Not sure if pasting the brain helps that much? Isn't the issue more about having to paste the astroid_cache?

That's what I was referring to:

astroid/astroid/manager.py

Lines 56 to 57 in 5fa9089

    
           brain: AstroidManagerBrain = { 
        
               "astroid_cache": {},

jacobtylerwalls · 2023-06-12T23:37:17Z

Perhaps:

parallelize _get_asts()
collect those results into one set (where results = both the trees and the astroid cache formed along the way)
send that to the workers when parallelizing the lint runs

That way worker processes never have to update shared state. 🤔

DanielNoord · 2023-06-13T05:59:11Z

Perhaps:

parallelize _get_asts()

collect those results into one set (where results = both the trees and the astroid cache formed along the way)

send that to the workers when parallelizing the lint runs

That way worker processes never have to update shared state. 🤔

I did this, but this gives minimal gain. The creation of an astroid.Module for a module via ast_from_file isn't the main performance bottleneck. I created a astroidd deamon (for lack of a better name) but saw almost no significant performance gain. Another issue was that you can't pickle all astroid.Module objects, and that if you could almost all performance benefits of "caching" ast_from_file is lost during the pickling.

What we should "cache"/parallelize is node.infer() as that is where we start inferring ImportNodes which seem to take the most time currently. However, during _get_asts we don't really know which ImportNodes we will see and it also seems like a bit of a bad design to just follow all ImportNodes at the start even though we might not need them.

jacobtylerwalls · 2023-06-15T23:17:04Z

Do you have an experimental branch where you tried this stuff? If the bottleneck really is somewhere else it would be instructive to profile it and find out where it's lurking instead. EDIT: ah, you did say you expect the bottleneck to be in inferring import nodes.

DanielNoord · 2023-06-20T08:04:16Z

Sadly I don't as the code wasn't achieving what I tried to do, so I deleted it 😅

Pierre-Sassoulas added topic-performance Discussion 🤔 labels Mar 10, 2023

jacobtylerwalls self-assigned this May 14, 2023

jacobtylerwalls assigned DanielNoord May 14, 2023

jacobtylerwalls mentioned this issue Jun 7, 2023

Promote most dunder definitions on FunctionModel to ObjectModel #1519

Draft

DanielNoord mentioned this issue Jun 13, 2023

Remove monkey-patching of methods onto classes #679

Closed

DanielNoord removed their assignment Jul 15, 2023

jacobtylerwalls removed their assignment Aug 23, 2023

DanielNoord mentioned this issue Oct 1, 2023

[4.0.x] Make manager required in AstroidBuilder/InspectBuilder #2313

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making `AstroidManager` interactable from multiple processes #2048

Making `AstroidManager` interactable from multiple processes #2048

DanielNoord commented Mar 9, 2023

jacobtylerwalls commented Mar 10, 2023

jacobtylerwalls commented May 14, 2023 •

edited

DanielNoord commented May 14, 2023

jacobtylerwalls commented May 14, 2023

jacobtylerwalls commented Jun 12, 2023

DanielNoord commented Jun 13, 2023

jacobtylerwalls commented Jun 15, 2023 •

edited

DanielNoord commented Jun 20, 2023

Making AstroidManager interactable from multiple processes #2048

Making AstroidManager interactable from multiple processes #2048

Comments

DanielNoord commented Mar 9, 2023

jacobtylerwalls commented Mar 10, 2023

jacobtylerwalls commented May 14, 2023 • edited

DanielNoord commented May 14, 2023

jacobtylerwalls commented May 14, 2023

jacobtylerwalls commented Jun 12, 2023

DanielNoord commented Jun 13, 2023

jacobtylerwalls commented Jun 15, 2023 • edited

DanielNoord commented Jun 20, 2023

Making `AstroidManager` interactable from multiple processes #2048

Making `AstroidManager` interactable from multiple processes #2048

jacobtylerwalls commented May 14, 2023 •

edited

jacobtylerwalls commented Jun 15, 2023 •

edited