Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export_report is very slow #2771

Open
jonahpearl opened this issue Apr 26, 2024 · 2 comments
Open

export_report is very slow #2771

jonahpearl opened this issue Apr 26, 2024 · 2 comments
Labels
exporters Related to exporters module performance Performance issues/improvements

Comments

@jonahpearl
Copy link

Hi there — thank you all for maintaining this suite of great tools! I am trying to write a script that handles pre-processing / sorting / post-processing all at once.

Describe the issue

I've noticed that the export_report function is very slow — almost 30 seconds per unit. If I have a few hundred units, that's an extra hour, which is in some cases more than the entire pre-processing + sorting!

My recording is 1 hour long, and the units seem to have relatively low firing rates (1 - 20) so I don't think that's the issue. This is on version 0.100.4, so I'm willing to be told that these issues have been solved, but just poking around the code a bit, it doesn't look like it.

Reproducing

I can reproduce this by simply re-loading the waveform extractor from a folder like we = si.load_waveforms(waveform_dir), and then running export_report(we, output_folder=qc_dir, **job_kwargs). Everything is fast except generating the per-unit plots at the end.

Ideas

I dug around a bit, and I have two clues:

  • the slowest part seems to be sw.plot_unit_waveforms, which takes about ~20 seconds on its own. I can't figure out why it's slow, as loading in the templates / waveforms (i.e. we.get_waveforms(60)) is very fast. I guess matplotlib is slow to plot hundreds of lines? Speaking for myself, I find the smear of all the raw waveforms totally uninformative:
image and I would rather have a mean +/- std, which I assume would also be much faster.
  • for the amplitudes part of the plot: running something like sw.plot_amplitudes(we, unit_ids=[60]) directly, takes roughly the same amount of time (~13 seconds) as just loading the data with this line:
    amplitudes = sac.get_data(outputs="by_unit")

    I see that the point of loading all the data is to allow the widget to plot arbitrarily many units at once, but in the case where we plot only one unit per instance of the widget, many times, this is a huge time suck. I admit I don't really see why loading the amplitude data takes so long, if I just run np.load("/path/to/amplitude_segment_0.npy") it's very fast (< 1 sec).

So in summary, I would suggest:
-- have an option to plot mean/std instead of raw waveforms, or just not show waveforms at all, in the export_report function.
-- figure out a way to make loading unit amplitudes as fast as loading the waveform data.

@zm711 zm711 added exporters Related to exporters module performance Performance issues/improvements labels Apr 26, 2024
@zm711
Copy link
Collaborator

zm711 commented Apr 26, 2024

Hey @jonahpearl,

I just profiled sw.plot_unit_summary the last step of export_report from main and it took:

%timeit sw.plot_unit_summary(analyzer, unit_id=0)
3.77 s ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Then I did the same with export_report

%timeit sexp.export_report(test_analyzer2, output_folder='./test', remove_if_exists=True)
2.47 s ± 9.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This had five units. When I tested it with one unit only

%timeit sexp.export_report(test_analzyer, output_folder='./test', remove_if_exists=True)
683 ms ± 9.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

so it seems to be roughly scaling with units. 100 units * 0.5 sec/unit would be about 1 min for 100 units. For reference this was based on a 75 minutes recording.

Would you be willing to update to main and seeing if you still have the slowdown in performance.

Also important to note that for these analyzer tests I put my analyzer in-memory format rather than on disk. Which is a feature with the SortingAnalyzer that may provide some speed boost.

@samuelgarcia
Copy link
Member

Hi,
thanks for pointing this.
Yes adding more options to remove waveforms or so to speedup the plot_unit_summary is a good idea.
Go in this direction with a PR if you have time for this.

A good idea also would to use PoolWorker to make units figure in parralel but this is more work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporters Related to exporters module performance Performance issues/improvements
Projects
None yet
Development

No branches or pull requests

3 participants