Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the runner into a separate package with minimal dependencies #557

Open
1 task done
kshpytsya opened this issue Mar 3, 2024 · 5 comments
Open
1 task done
Labels
enhancement New feature or request good first issue Good for newcomers Packaging Related to packaging and releasing

Comments

@kshpytsya
Copy link

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

The system I am currently working on involves some confidential customer data to which regular developers do not have access. Due to the complex nature of ML algorithms and libraries involved, sometimes we encounter anomalous memory usage which we cannot easily reproduce with our test data set. I am considering integration of some memory profiling mechanism, possibly involving memray which would be selectively enabled in production environment to help us to get insights into problematic cases without requiring access to customer data.

For multiple reasons (security, reliability, docker image size to name a few) we try to minimize the number of our dependencies. Memray has a number of dependencies which are unnecessary for trace collection:

memray/setup.py

Lines 91 to 96 in 578e02d

install_requires = [
"jinja2 >= 2.9",
"typing_extensions; python_version < '3.8.0'",
"rich >= 11.2.0",
"textual >= 0.41.0",
]

and those have their own dependencies as well.

Describe the solution you'd like

It would be great to split memray into two packages, tentatively named memray-core and memray. The former would only be able to perform collection, and the later would do everything memray does today.

Alternatives you considered

Another possibility would be having all the dependencies not related to memray.Tracker to be under "extras" but that would probably be a nuisance for most ordinary users who would want a simple pip install memray to install everything.

@kshpytsya kshpytsya added the enhancement New feature or request label Mar 3, 2024
@godlygeek
Copy link
Contributor

Hm. It's true that Jinja2, Rich, and Textual aren't needed for collecting profile data, but they are needed for generating reports from the profile data, and as a general rule, we expect the reporters to be run on the same system as the report was generated on... In the case of --native mode reports, it can be extremely difficult to reproduce an environment on a different machine where the instruction pointers and shared library names stored in the capture file are able to be interpreted correctly. Because of that, the idea of a memray-core package with no support for interpreting the capture file seems problematic. It encourages a workflow that's likely to cause at least a lot of friction, and that may prevent certain types of reports from being generated from the captured file at all.

@kshpytsya
Copy link
Author

I do realize it may need extra refactoring but I do believe it would be possible to implement the collection phase (including the extraction the required information from the running environment, e.g. from shared libraries) without those libraries, and make the resulting files completely portable.
Of course, the code is yours, but to me this would seem a better architecture.

To clarify, I envision roughly the following flow:

  1. with memray_core.Tracker: ... exactly what it does today. memray.Tracker would be just a re-export of memray_core.Tracker.
  2. memray_core.augment(<list of files produced by Tracker>, ...) (tentative name) either produces a single or multiple files containing all the additional data required to make files from step 1 portable, or just aggregates everything into a single file containing both data from step 1, and all the additional data.
  3. memray flamergraph|table|tree|parse|summary|stats <data from step 2> can be executed on a different machine (possibly running different OS and Python version).

I am unsure whether a CLI for steps 1 and 2 would be desired/useful.

@pablogsal
Copy link
Member

pablogsal commented Mar 5, 2024

Of course, the code is yours, but to me this would seem a better architecture.

It's better in the sense that's more modular and allows to do what you are suggesting but has other significant drawbacks:

  • It complicates a lot publishing as not only we need ti publish now more packages but we need to account for the possibility of someone having a incompatible version between the two.
  • It complicates maintaining the package as we need to ensure that the two pieces remain logically separated and there is no hard coupling.
  • We need a nontrivial amount of work to be able to produce self-contained files with memray augment or whatever other name we choose for this.
  • The workflow is more complicated as now there are more kind of files with all the different options and we need to account for how to treat all of them in the new command and all the reporters.

In any case, we will discuss this as there is some potential gain here but my guess is that we will unfortunately need to reject this suggestion as this complicates the maintenance and release and also entails some extra development to keep a workflow that is not the most common one.

@pablogsal
Copy link
Member

pablogsal commented Mar 6, 2024

Update: we agreed that this makes sense to do and we will investigate ways to handle the complexity. Not sure how much it will take but we will probably do some form of this 👍

@sarahmonod sarahmonod added good first issue Good for newcomers Packaging Related to packaging and releasing labels May 20, 2024
@sarahmonod
Copy link
Contributor

After talking about it, we decided we would take the following approach:

  1. Make an empty memray package that contains NO code, and that depends upon memray_core[all] of the exact same version (so memray==a.b.c depends on memray_core==a.b.c)
  2. Make a new package called memray_core that contains all of the code, but with optional dependencies, so that we can separate dependencies into multiple groups: html-reporters, textual-reporters, and all (that third one containing the union of the two kinds of reporters)
  3. Change memray to be able to run with either no reporter (so only the tracking API + CLI), or only one reporter, or none.

That last one means we also need to:

  1. Change the listing of all reporters (by the CLI) to be dynamic, on a plugin-like system that can handle all combinations of reporters being installed
  2. Change the live reporter to exit and fail with a meaningful error message if the textual-reporters aren't installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Packaging Related to packaging and releasing
Projects
None yet
Development

No branches or pull requests

4 participants