You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there — thank you all for maintaining this suite of great tools! I am trying to write a script that handles pre-processing / sorting / post-processing all at once.
Describe the issue
I've noticed that the export_report function is very slow — almost 30 seconds per unit. If I have a few hundred units, that's an extra hour, which is in some cases more than the entire pre-processing + sorting!
My recording is 1 hour long, and the units seem to have relatively low firing rates (1 - 20) so I don't think that's the issue. This is on version 0.100.4, so I'm willing to be told that these issues have been solved, but just poking around the code a bit, it doesn't look like it.
Reproducing
I can reproduce this by simply re-loading the waveform extractor from a folder like we = si.load_waveforms(waveform_dir), and then running export_report(we, output_folder=qc_dir, **job_kwargs). Everything is fast except generating the per-unit plots at the end.
Ideas
I dug around a bit, and I have two clues:
the slowest part seems to be sw.plot_unit_waveforms, which takes about ~20 seconds on its own. I can't figure out why it's slow, as loading in the templates / waveforms (i.e. we.get_waveforms(60)) is very fast. I guess matplotlib is slow to plot hundreds of lines? Speaking for myself, I find the smear of all the raw waveforms totally uninformative:
and I would rather have a mean +/- std, which I assume would also be much faster.
for the amplitudes part of the plot: running something like sw.plot_amplitudes(we, unit_ids=[60]) directly, takes roughly the same amount of time (~13 seconds) as just loading the data with this line:
I see that the point of loading all the data is to allow the widget to plot arbitrarily many units at once, but in the case where we plot only one unit per instance of the widget, many times, this is a huge time suck. I admit I don't really see why loading the amplitude data takes so long, if I just run np.load("/path/to/amplitude_segment_0.npy") it's very fast (< 1 sec).
So in summary, I would suggest:
-- have an option to plot mean/std instead of raw waveforms, or just not show waveforms at all, in the export_report function.
-- figure out a way to make loading unit amplitudes as fast as loading the waveform data.
The text was updated successfully, but these errors were encountered:
so it seems to be roughly scaling with units. 100 units * 0.5 sec/unit would be about 1 min for 100 units. For reference this was based on a 75 minutes recording.
Would you be willing to update to main and seeing if you still have the slowdown in performance.
Also important to note that for these analyzer tests I put my analyzer in-memory format rather than on disk. Which is a feature with the SortingAnalyzer that may provide some speed boost.
Hi,
thanks for pointing this.
Yes adding more options to remove waveforms or so to speedup the plot_unit_summary is a good idea.
Go in this direction with a PR if you have time for this.
A good idea also would to use PoolWorker to make units figure in parralel but this is more work...
Hi there — thank you all for maintaining this suite of great tools! I am trying to write a script that handles pre-processing / sorting / post-processing all at once.
Describe the issue
I've noticed that the
export_report
function is very slow — almost 30 seconds per unit. If I have a few hundred units, that's an extra hour, which is in some cases more than the entire pre-processing + sorting!My recording is 1 hour long, and the units seem to have relatively low firing rates (1 - 20) so I don't think that's the issue. This is on version
0.100.4
, so I'm willing to be told that these issues have been solved, but just poking around the code a bit, it doesn't look like it.Reproducing
I can reproduce this by simply re-loading the waveform extractor from a folder like
we = si.load_waveforms(waveform_dir)
, and then runningexport_report(we, output_folder=qc_dir, **job_kwargs)
. Everything is fast except generating the per-unit plots at the end.Ideas
I dug around a bit, and I have two clues:
sw.plot_unit_waveforms
, which takes about ~20 seconds on its own. I can't figure out why it's slow, as loading in the templates / waveforms (i.e.we.get_waveforms(60)
) is very fast. I guess matplotlib is slow to plot hundreds of lines? Speaking for myself, I find the smear of all the raw waveforms totally uninformative:sw.plot_amplitudes(we, unit_ids=[60])
directly, takes roughly the same amount of time (~13 seconds) as just loading the data with this line:spikeinterface/src/spikeinterface/widgets/amplitudes.py
Line 56 in 7d0e1da
I see that the point of loading all the data is to allow the widget to plot arbitrarily many units at once, but in the case where we plot only one unit per instance of the widget, many times, this is a huge time suck. I admit I don't really see why loading the amplitude data takes so long, if I just run
np.load("/path/to/amplitude_segment_0.npy")
it's very fast (< 1 sec).So in summary, I would suggest:
-- have an option to plot mean/std instead of raw waveforms, or just not show waveforms at all, in the export_report function.
-- figure out a way to make loading unit amplitudes as fast as loading the waveform data.
The text was updated successfully, but these errors were encountered: