Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for writing output to different file #31

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

JacksonBurns
Copy link

Hello, and thanks for the great tool!

I am working on an package which generates plots for the user, and as implemented, pylustrator would be editing the source code of my package when save... is used. This won't be useful when this is distributed on PyPi, as the edits will be buried in an environment folder and not runnable due to the structure of the package.

This PR contains a proof of concept for having pylustrator create a separate and standalone file which can be run to regenerate the plot separate from the source code.

Run temp.py, click save... in the pylustrator UI, and then you should be able to run thisisatest.py to regenerate the plot.

Thoughts on building up this implementation or continuing at all? I have some doubts as to whether or not this would work on plots which are not line plots, but this might still be workable.

@rgerum
Copy link
Owner

rgerum commented Aug 4, 2021

hmm, this seems interesting to be able to save the whole plot to a new and clean python file. Here it might even be interesting to then create a minimal python file that tries to condense all descriptions.
So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated.
But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features. And I think some like plt.errorbar does not create matplotlib artists which still know about the data. So pylustrator would need here to inject some tracing code into the plt.errorbar function.

@JacksonBurns
Copy link
Author

So there e.g. the line width could be edited directly into the plt.plot function and does not need to be in a separate pylustaror generated code part, as all is pylustrator generated.

I actually think the easiest way forward would be to leave all plot customization in the pylustrator -- it already does an excellent job tracking plot changes, so I don't think we need to reinvent the wheel. The ideal workflow would look like this:

  1. User writes code to generate the X and y data that they want to plot.
  2. User does no formatting of any kind and only calls plt.plot(X,y).
  3. In the pylustrator window they make all changes of interest.
  4. The exported code would then create a standalone file.

But the problem here is, I think, that it would need to support all or at least a great deal of the matplotlib features.

Agreed that this would be difficult. If the above method were to work, though, all we would need to do is get the "data" our of all the conceivable plot types. For this simple example, it is pretty much fully functional. For examples like you mentioned with error bars though, I am not sure how to go about it since they don't seem to leave any easily accessible record of the data plotted.

Alternate approach -- with my proposed layout, the user is presumably importing a single library (like seaborn, etc.) and calling a plotting tool (i.e. seaborn.heatmap). If the signature of plt.figure() included an optional argument for specifying the required imports and call to the plotting function, pylustrator could simply write this to the output file. See latest commit for example of this in complex_example.py.

@rgerum
Copy link
Owner

rgerum commented Aug 4, 2021

hmm your example looks quite a bit bulky as the user has to essentially write their code twice. But maybe the question here is a bit what the use case actually is for which to optimize.

I thought it might be nice to have a way to "serialize" a whole matplotlib plot. To be used either from pylustrator interface (e.g. save to different file) or just as a function call: pylustrator.export("new_script.py"). Which could be also quite interesting if you just want to have a simple script file to reproduce your figure without the preprocessing the user might persumably might have in their original script.

@JacksonBurns
Copy link
Author

I can refactor this into a new method to export the changes script to a Python file, sure. That sounds quite helpful. Will open a new PR, though, to keep things separated.

The use cases I am envisioning is fully reproducible plots where the source code that generates the data is (a) too slow to re-run constantly or (b) 'hidden' in a package, like in my case. For both of these cases, I agree that bulk is bad but I think some bulk might be ok. Because the code is either (a) only going to be run once or (b) going to be written by someone else and 'hidden' the implementation specifics shouldn't be too much of a pain.
Two ideas:

  1. To serialize the entire interface, could we just pickle the interface + figure and load it from there? I have not done this before so pardon a potentially naive question.
  2. Instead of requiring the user to copy their code, we can use decorators or pragmas. This would look something like this:
import numpy as np; np.random.seed(0)
uniform_data = np.random.rand(10, 12)
import pylustrator
pylustrator.start()
plt.figure(
    output_file="thisisatest.py"
)
@pylustrator_start __or__ # pragma: pylustrator start
import seaborn as sns; sns.set_theme()
import matplotlib.pyplot as plt
ax = sns.heatmap(uniform_data)
@pylustrator_end __ or__ # pragma: pylustrator end
plt.show()

I think this second approach could be quite useful.

@rgerum
Copy link
Owner

rgerum commented Aug 4, 2021

So what I have done for user interfaces that should export plotting code, I wrapped the plot script inside a function. As python introspection can return the code of a function, for your usecase this should be the best solution, as it would allow you to even export the code with comments.
And I think that might be cleaner than adding these pragma comments.

import numpy as np; np.random.seed(0)
uniform_data = np.random.rand(10, 12)
import pylustrator
pylustrator.start()

def do_plot(uniform_data):
    import seaborn as sns; sns.set_theme()
    import matplotlib.pyplot as plt
    ax = sns.heatmap(uniform_data)

plt.figure(
    output_file="thisisatest.py",
    code_function=[do_plot, uniform_data],
)
do_plot(uniform_data)
plt.show()

@rgerum
Copy link
Owner

rgerum commented Aug 4, 2021

But I think in general these are two slightly different use cases:

  1. you want to have the user export some plot from your user infertece/package. This code could then pre-prepared by the package author and maybe nicely formatted with comments etc.
  2. you just want to be able to dump the output of an arbitrary script that generated a matplotlib plot into a script that creates this plot.

@rgerum
Copy link
Owner

rgerum commented Aug 4, 2021

a code creation of a plot function could look like this (I have used a similar function once):

def value_create(key, value):
    import numpy as np
    import pandas as pd
    if isinstance(value, str):
        return f"{key} = \"{value}\"\n"
    elif isinstance(value, np.ndarray):
        return f"import numpy as np\n{key} = np.{repr(value)}\n"
    elif isinstance(value, pd.DataFrame):
        return f"import pandas as pd\nimport io\n{key} = pd.read_csv(io.StringIO('''{value.to_csv()}'''))\n"
    return f"{key} = {repr(value)}\n"


def execute(func, *args, **kwargs):
    func(*args, **kwargs)
    import inspect
    code_lines = inspect.getsource(func).split("\n")[1:]
    indent = len(code_lines[0]) - len(code_lines[0].lstrip())
    code = "\n".join(line[indent:] for line in code_lines)
    for key, value in kwargs.items():
        code = value_create(key, value) + code
    return code

def plot(uniform_data, data_frame, color):
    import seaborn as sns
    sns.set_theme()
    import matplotlib.pyplot as plt
    plt.subplot(121)
    ax = sns.heatmap(uniform_data)
    plt.subplot(122)
    plt.plot(data_frame["a"], data_frame["b"], color=color)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
uniform_data = np.random.rand(10, 12)
data_frame = pd.DataFrame([[1,2],[3,4]], columns=["a", "b"])
code = execute(plot, uniform_data=uniform_data, data_frame=data_frame, color="red")
print(code)
plt.show()

@JacksonBurns
Copy link
Author

I will get this going into a working example asap

@JacksonBurns
Copy link
Author

Hi @rgerum!

I have completed a working version of the standalone file writing. Run temp.py and take a look when you can -- it should generate sample_pylustrator_output.py from scratch, which you can then run on its own to recreate the plot.

The way this works is that if output_file is given a name and reqd_code is provided, all those objects will be included in the new output and it will be a standalone, runnable file. If reqd_code is not provided but output_file is still specified, the output to the file will only contain the change code written by pylustrator, i.e. the first part of my earlier comment.

This implementation also assumes that the reqd_code is provided as: [function, argument_1, argument_2, ...]

Sorry for the massive diffs now. I had my auto-formatter running, and it changed a bunch of docstrings, indentation styles etc. I do think it would be a good idea to introduce some more uniform formatting across the repo, though. Black is my personal favorite.

@JacksonBurns
Copy link
Author

Hi @rgerum just checking in on this again -- how does the PR look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants