export_report is very slow #2771

jonahpearl · 2024-04-26T19:41:39Z

Hi there — thank you all for maintaining this suite of great tools! I am trying to write a script that handles pre-processing / sorting / post-processing all at once.

Describe the issue

I've noticed that the export_report function is very slow — almost 30 seconds per unit. If I have a few hundred units, that's an extra hour, which is in some cases more than the entire pre-processing + sorting!

My recording is 1 hour long, and the units seem to have relatively low firing rates (1 - 20) so I don't think that's the issue. This is on version 0.100.4, so I'm willing to be told that these issues have been solved, but just poking around the code a bit, it doesn't look like it.

Reproducing

I can reproduce this by simply re-loading the waveform extractor from a folder like we = si.load_waveforms(waveform_dir), and then running export_report(we, output_folder=qc_dir, **job_kwargs). Everything is fast except generating the per-unit plots at the end.

Ideas

I dug around a bit, and I have two clues:

the slowest part seems to be sw.plot_unit_waveforms, which takes about ~20 seconds on its own. I can't figure out why it's slow, as loading in the templates / waveforms (i.e. we.get_waveforms(60)) is very fast. I guess matplotlib is slow to plot hundreds of lines? Speaking for myself, I find the smear of all the raw waveforms totally uninformative:

and I would rather have a mean +/- std, which I assume would also be much faster.

for the amplitudes part of the plot: running something like sw.plot_amplitudes(we, unit_ids=[60]) directly, takes roughly the same amount of time (~13 seconds) as just loading the data with this line:

spikeinterface/src/spikeinterface/widgets/amplitudes.py

Line 56 in 7d0e1da

amplitudes = sac.get_data(outputs="by_unit")

I see that the point of loading all the data is to allow the widget to plot arbitrarily many units at once, but in the case where we plot only one unit per instance of the widget, many times, this is a huge time suck. I admit I don't really see why loading the amplitude data takes so long, if I just run np.load("/path/to/amplitude_segment_0.npy") it's very fast (< 1 sec).

So in summary, I would suggest:
-- have an option to plot mean/std instead of raw waveforms, or just not show waveforms at all, in the export_report function.
-- figure out a way to make loading unit amplitudes as fast as loading the waveform data.

The text was updated successfully, but these errors were encountered:

zm711 · 2024-04-26T21:05:06Z

Hey @jonahpearl,

I just profiled sw.plot_unit_summary the last step of export_report from main and it took:

%timeit sw.plot_unit_summary(analyzer, unit_id=0)
3.77 s ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Then I did the same with export_report

%timeit sexp.export_report(test_analyzer2, output_folder='./test', remove_if_exists=True)
2.47 s ± 9.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This had five units. When I tested it with one unit only

%timeit sexp.export_report(test_analzyer, output_folder='./test', remove_if_exists=True)
683 ms ± 9.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

so it seems to be roughly scaling with units. 100 units * 0.5 sec/unit would be about 1 min for 100 units. For reference this was based on a 75 minutes recording.

Would you be willing to update to main and seeing if you still have the slowdown in performance.

Also important to note that for these analyzer tests I put my analyzer in-memory format rather than on disk. Which is a feature with the SortingAnalyzer that may provide some speed boost.

samuelgarcia · 2024-04-29T08:51:43Z

Hi,
thanks for pointing this.
Yes adding more options to remove waveforms or so to speedup the plot_unit_summary is a good idea.
Go in this direction with a PR if you have time for this.

A good idea also would to use PoolWorker to make units figure in parralel but this is more work...

zm711 added exporters Related to exporters module performance Performance issues/improvements labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export_report is very slow #2771

export_report is very slow #2771

jonahpearl commented Apr 26, 2024

zm711 commented Apr 26, 2024 •

edited

samuelgarcia commented Apr 29, 2024

export_report is very slow #2771

export_report is very slow #2771

Comments

jonahpearl commented Apr 26, 2024

Describe the issue

Reproducing

Ideas

zm711 commented Apr 26, 2024 • edited

samuelgarcia commented Apr 29, 2024

zm711 commented Apr 26, 2024 •

edited