Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting traces in parallel to speed up get_noise_levels (or any other traces related functions) #2382

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

yger
Copy link
Collaborator

@yger yger commented Jan 2, 2024

We can extent the pipeline machinery as explained in #2380 in order to get data chunks in parallel. The list of chunks passed to the ChunkRecordingExecutor can be customized accordingly, to avoid looping over unnecessary chunks

@yger
Copy link
Collaborator Author

yger commented Jan 3, 2024

Is it normal than the behavior of get_random_data_chunks, in main, can allow overlapping chunks? This seems weird to me, and thus the tests are not passing because of some behavior I would not expect. I mean, intuitively when I wan to select N random chunks in a small recording, I expect them to be non-overlapping (to avoid biasing the stats) @alejoe91 @samuelgarcia . But currently, this is not the case....

@yger
Copy link
Collaborator Author

yger commented Jan 3, 2024

Note that I'm well aware that the parallelism is adding an rather large overhead, thus such a process can be useful when I/O are slow, for example when getting data from a remote location. Need to be discuss

@alejoe91
Copy link
Member

alejoe91 commented Jan 3, 2024

Is it normal than the behavior of get_random_data_chunks, in main, can allow overlapping chunks? This seems weird to me, and thus the tests are not passing because of some behavior I would not expect. I mean, intuitively when I wan to select N random chunks in a small recording, I expect them to be non-overlapping (to avoid biasing the stats) @alejoe91 @samuelgarcia . But currently, this is not the case....

I think that when possible the chunks should be non overlapping indeed

@yger yger marked this pull request as ready for review January 3, 2024 14:33
@yger yger added enhancement New feature or request core Changes to core module labels Jan 4, 2024
@samuelgarcia
Copy link
Member

Salut Pierre.
I am OK with the idea but I am not sure to like the implementation.
The new run_traces_pipeline is more or less a ChunkRecordingExecutor that return traces.
The is not a super good idea because traces of pickle a transimited to main process which make lot a memory bandwith.
I will try to make another implementation with sharemem and without the pipeline mechanism.
The pipeline module is for peak or spikes adding functionnality for random chunk getter make the module more fuzy. no ?

@yger
Copy link
Collaborator Author

yger commented Jan 9, 2024

I am open to any suggestion. I just wanted to highlight the potential speedup there, especially for slow/remote I/O. Thanks a lot for having a look into that !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants