Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default bandpass parameters #533

Open
martinwimpff opened this issue Dec 8, 2023 · 7 comments
Open

default bandpass parameters #533

martinwimpff opened this issue Dec 8, 2023 · 7 comments
Labels

Comments

@martinwimpff
Copy link

class MotorImagery(SinglePass):
"""N-class motor imagery.
Metric is 'roc-auc' if 2 classes and 'accuracy' if more
Parameters
-----------
events: List of str
event labels used to filter datasets (e.g. if only motor imagery is
desired).
n_classes: int,
number of classes each dataset must have. If events is given,
requires all imagery sorts to be within the events list.
fmin: float (default 8)
cutoff frequency (Hz) for the high pass filter
fmax: float (default 32)
cutoff frequency (Hz) for the low pass filter

Is there any reason why the default bandpass values are 8 and 32?
In my opinion these should be None i.e. the default is to not use a filter at all.

@bruAristimunha
Copy link
Collaborator

Hey @martinwimpff!

Yes! There is a reason! The narrative revolves around Moabb's philosophy on reproducibility, a concept shaped by @vinay-jayaram and @alexandrebarachant.

About five years back, before Moabb, articles often played around with things like bandpass, baseline, etc for different datasets and methods. This made it impossible to compare them fairly. Authors usually had some extra tricks that made their method seem way better.

So, Moabb stepped in. They said, "Hold up! Let's treat all the datasets the same way, right from the start." They processed the raw data uniformly and checked the methods in the same way—whether within-session, cross-session, or cross-subject evaluation.

Just a heads-up, this preprocessing happens only when you're using the motor-imagery/ssvep/cvep/p300 paradigm object, and the band interval selection was based on studies by Fabian Lotte (BCI) or other similar literature for each paradigm.

If you prefer, you can easily remove the bandpass by tweaking the object's values. In Braindecode, you get the raw data without this preprocessing because we grab it using the dataset object.

@bruAristimunha
Copy link
Collaborator

Tagging @sylvchev if you want to complement

@martinwimpff
Copy link
Author

Hi Bruno,

thanks for the fast response!
I get the reproducability point and I fully understand the comparison point.
However, the 8-32Hz Bandpass is far from perfect for most DL models (at least for the "most important" dataset, the BCIC IV 2a dataset). Therefore I fear that many models will not get the best results using the standard 8-32Hz bandpass. If people find this out, they will stop using moabbs standard configuration, which would go against the original intention of moabb.

Best,
Martin

@sylvchev
Copy link
Member

I understand your concern, do you have any results or do you know any publication that investigates the influence of bandpass filters on DL models across several model (more than just BCIC IV 2a)?
Anyway, those values are only the default ones by default and could be change if needed.

@vinay-jayaram
Copy link
Collaborator

If there are results suggesting that the bandpass is not helpful in DL situations then that's a good argument to change the default, but otherwise the original definition was to deal with the fact that the 8-32 bandpass also mitigates movement and muscle artifacts. Especially for DL it's important -- if you want to make a claim about brain interfacing and not simply EEG -- to provide evidence or use methods to convince the reader that the models aren't taking advantage of non-brain information as well.

I also strongly agree with Sylvain -- a single dataset should not be considered more important than any other (especially the BCIC datasets! They've been overfitted to for decades) unless its large enough to offer population coverage or was recorded on the same hardware setup as a planned closed-loop study.

@martinwimpff
Copy link
Author

Thanks for your responses @sylvchev @vinay-jayaram!
@sylvchev no official publication, just personal experience.
@vinay-jayaram I get your point and those artifacts might be present as discussed in this publication. However, they don't use the BCIC datasets and they use a 4-40Hz filter and discard the first second after the cue to "overcome" this issue. Their investigation is good but not complete.

DL often uses 4-40Hz BP whereas CSP & Co. tend to use smaller bandwidths like 8-32Hz. I personally like it when the default parameters don't change the original data at all such that every modification of the original data is an "active choice" which should then be mentioned in the paper.

Finally, I think this is more about personal preference so it is not necessary to change the default parameters. However, I still see the problem that people may be discouraged from using MOABB as it was intended.
The solution for future datasets would be to define the preprocessing (e.g. 8-32Hz BP) upfront.

@sylvchev
Copy link
Member

Do you obtain a noticeable difference with 4-40 Hz instead of 8-32Hz and with what kind of DL models?

Regarding leaving the data as is and to make the filtering part more visible (with a preprocessing step), I understand your point but the MOABB community is very diverse. We provide the data "as is" with the dataset object, and you could do what you want with it (useful for Neuroscience folks), or if you are more in ML and don't know about EEG, the paradigms are there to ensure that the preprocessing is correct and you get a ndarray easy to handle.

Indeed, DL blurred the lines with the end to end approaches. With @PierreGtch we are adding the possibility to make batch preprocessing, defined for all your data and save the transformed dataset for further reuse without needing to apply the preprocessing steps.

Only few users are both knowledgeable in EEG and ML, and they know how the data is processed or know how to find the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants