Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add ability to reject epochs using callables #12195

Merged
merged 54 commits into from
Feb 2, 2024

Conversation

withmywoessner
Copy link
Contributor

@withmywoessner withmywoessner commented Nov 11, 2023

Reference issue

Example: Fixes ##11775.

What does this implement/fix?

Add ability to reject epochs using functions.
Ex: epochs = mne.Epochs(..., reject=dict(eeg=lambda x: True if np.median(x) > 1e-5 else False))

@withmywoessner
Copy link
Contributor Author

I am still working on this. Does anyone know of any good testing data that has massive amplitude spikes?

@larsoner
Copy link
Member

The testing dataset should have some spikes due to blinks but I'm not sure they'd qualify as massive. You can always create one using RawArray or EpochsArray, and it can end up being nice and readable/explicit that way.

@withmywoessner
Copy link
Contributor Author

Thanks @larsoner! I am getting ready to write a tutorial (and the test though this question applies to both) as per @drammock's suggestion. First, I wanted to ask is the way I generated sample bad data appropriate or should I use the source simulator class (I did not really know how to get the desired large amplitude spike easily)? Here is what I made:

data_path = mne.datasets.testing.data_path(download=False)
fname_raw_testing = data_path / "MEG" / "sample" / "sample_audvis_trunc_raw.fif"
raw = mne.io.read_raw_fif(fname_raw_testing)
raw.crop(0, 5)
raw.del_proj()
chans = raw.info['ch_names'][-6:-1]
raw.pick(chans)
data = raw.get_data()

new_data = data

# Make large spike
new_data[0, 180:200] *= 1e7

# Shift amp up to calculate median/mean later
new_data[0, 610:880] += 1e-3
edit_raw = mne.io.RawArray(new_data, raw.info)

Second, I wanted to include a section of the tutorial for including boolean operations such as:

epochs = mne.Epochs(edit_raw, 
                    events, 
                    tmin=0, 
                    tmax=1, 
                    baseline=None, 
                    reject=dict(eeg=lambda x: True if ((np.max(x, axis=1) > 1).any() or 
                                                       (np.median(x, axis=1) > 1e-3).any()) else False),
                    preload=True, verbose='DEBUG'
) 

In the tutorial is it better to place the complicated lambda function in the construction or declare a new function like:

# Preparing a function for the 'reject' parameter
def reject_criteria(x):
    max_condition = np.max(x, axis=1) > 1
    median_condition = np.median(x, axis=1) > 1e-3
    return True if max_condition.any() or median_condition.any() else False

# Creating epochs with specified parameters
epochs = mne.Epochs(
    raw=edit_raw, 
    events=events, 
    tmin=0, 
    tmax=1, 
    baseline=None, 
    reject=dict(eeg=reject_criteria),
    preload=True, 
    verbose='DEBUG'
)

Third, I was wondering if instead of modifying reject and flat, it would be better to just create a new parameter reject_func as it may be confusing to a user if reject or flat can do the same thing depending whether you use > or <.

Thanks for the help! :)

@larsoner
Copy link
Member

First, I wanted to ask is the way I generated sample bad data appropriate

I think using SourceSimulator would be overkill and direct data modification is fine!

In the tutorial is it better to place the complicated lambda function in the construction or declare a new function like:

New function is better/clearer, probably added as a short new section in https://mne.tools/stable/auto_tutorials/preprocessing/20_rejecting_bad_data.html / https://github.com/mne-tools/mne-python/blob/main/tutorials/preprocessing/20_rejecting_bad_data.py

Third, I was wondering if instead of modifying reject and flat, it would be better to just create a new parameter reject_func as it may be confusing to a user if reject or flat can do the same thing depending whether you use > or <.

Good question. I think as long as we clarify in the doc that the callable could detect all sorts of stuff -- either too large or too small amplitudes -- and that what goes in drop_log reflects whether it came from reject or flat then it's fine.

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Nov 13, 2023

Okay, I was just asking because I personally don't think I would ever use both if I was trying to specify a minimum and a maximum absolute value and would just use boolean operators. In the tutorial, is it fine if I just use reject and booleans to specify the min/max or should I use both to show it is possible @larsoner?

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Nov 16, 2023

I think as long as we clarify in the doc that the callable could detect all sorts of stuff -- either too large or too small amplitudes -- and that what goes in drop_log ref

I was looking over the documentation and I see the drop_log as 4 main reasons listed:
image
Currently, when you drop based on amplitude threshold (and my implementation) it just gives the channel name:
image
Should I add a new reason, 'REJECT_CRITERIA', for when you drop based on reject or flat that includes the function or value used to drop?

@withmywoessner
Copy link
Contributor Author

@larsoner I think I am almost done with this. When you have a chance can you look over my comments and let me how you think I should address those problems. Thanks so much!

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like good progress so far!

Should I add a new reason, 'REJECT_CRITERIA', for when you drop based on reject or flat that includes the function or value used to drop?

I completely forgot about this. Based on the docs of epochs.drop(..., reason="USER"), the string can actually be arbitrary. Moreover it can be a list/tuple of reasons. So how about if we require that the callable return a good, reason tuple like:

epochs.drop_bad(reject=lambda epoch: (epoch.max() > 1e-3, "> 1mV somewhere"))

Where good must be bool and reason must be str, list, tuple where each entry is a str. This allows for nice stuff like:

def my_reject(epoch_data):
    bad_idxs = np.where(np.percentile(epoch_data, axis=1) > 1e-3)
    reasons = tuple(epochs.ch_name[bad_idx] for bad_idx in bad_idxs)
    return len(bad_chs) > 0, reasons

And it also makes me think we should:

  1. Update epochs.drop(...) to allow reason=... to be a list or tuple of reasons rather than a single string at some point. But doesn't have to be this PR unless you're really motivated.
  2. Amend the epochs.drop_log attribute doc to mention that the entry can actually be some ch_name from epochs.ch_names as well when dropping based on flat or reject. This would be great and not too difficult to add to this PR, but not 100% necessary (I can create a new issue).

mne/epochs.py Outdated
@@ -819,9 +819,14 @@ def _reject_setup(self, reject, flat):
# check for invalid values
for rej, kind in zip((reject, flat), ("Rejection", "Flat")):
for key, val in rej.items():
if val is None or val < 0:
if callable(val):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify to:

if callable(val):
    continue
else:
    name = f"{kind} dict value for {key}"
    _validate_type(val, "numeric", name, extra="or callable")
    if val < 0:
         raise ValueError("{kind} {name} must be >= 0, got {val}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do it this way, but it breaks a test. I think its because _validate_type returns a TypeError and Pytest wants a ValueError:

   for val in (None, -1):  # protect against older MNE-C types
        for kwarg in ("reject", "flat"):
            pytest.raises(
                ValueError,
                Epochs,
                raw,
                events,
                event_id,
                tmin,
                tmax,
                picks=picks_meg,
                preload=False,
                **{kwarg: dict(grad=val)},
            )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just update the test, TypeError is a better error in the case where the type is wrong (i.e., _validate_type does something better than what we do in main)

mne/epochs.py Show resolved Hide resolved
mne/epochs.py Outdated
Comment on lines 855 to 857
if not is_callable:
# make sure new thresholds are at least as stringent
# as the old ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but I think you can make these checks more fine-grained without much more work. Instead of if any are callable, skip all checks (which is what you're doing now), you should within this loop be able to fairly easily make it so that if this type is or was callable, skip this check. I think it almost amounts to moving the conditional from above into this loop with a continue when callables are or were present. As a bonus, your diff will probably be smaller, too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I edit my test to check for this?

mne/epochs.py Outdated Show resolved Hide resolved
mne/utils/docs.py Outdated Show resolved Hide resolved
tutorials/preprocessing/20_rejecting_bad_data.py Outdated Show resolved Hide resolved
tutorials/preprocessing/20_rejecting_bad_data.py Outdated Show resolved Hide resolved
withmywoessner and others added 2 commits December 1, 2023 14:28
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
@larsoner
Copy link
Member

Yes it's possible to check preload. But if you go the .drop_bad(...)-only route it's not necessary. It will go through and load each epoch from raw on demand (or from ndarray if it's preloaded) and decide whether to keep or reject it. So maybe it's cleaner to use that rather than try to support it in __init__(...) with some processing delayed until later.

Would supporting just .drop_bad(...) work for your use case?

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Jan 24, 2024

Yes @larsoner! I think I might write new docustrings for drop_bad because it currently it shares docustrings with reject and flat in the constructor of Epochs. I think it would be confusing if this functionality was mentioned in the docustrings for both.

@withmywoessner withmywoessner marked this pull request as ready for review January 25, 2024 01:27
@withmywoessner
Copy link
Contributor Author

Hey @larsoner, just pinging here. I think everything should be ready finally. Thanks so much for helping me with this :)

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two tiny things I noticed, will commit and mark for merge-when-green. Thanks for working through my confusions on this one @withmywoessner !

doc/changes/devel/12195.newfeature.rst Outdated Show resolved Hide resolved
mne/utils/mixin.py Outdated Show resolved Hide resolved
@larsoner larsoner changed the title [WIP] [ENH] Add ability to reject epochs using callables ENH: Add ability to reject epochs using callables Feb 2, 2024
@larsoner larsoner enabled auto-merge (squash) February 2, 2024 14:32
@larsoner larsoner merged commit 3a42bb9 into mne-tools:main Feb 2, 2024
29 checks passed
@withmywoessner withmywoessner deleted the epoch_reject branch February 2, 2024 16:44
@drammock
Copy link
Member

drammock commented Feb 2, 2024

maybe I'm misunderstanding something, but it looks like Circle's build_docs job passed in 8 seconds (?!) on the last commit of this PR, and didn't actually build the docs... which are now failing on #12411 with

/home/circleci/project/mne/epochs.py:docstring of mne.epochs.drop_bad:49: WARNING: Explicit markup ends without a blank line; unexpected unindent.

@larsoner any idea why we got a false-positive pass from Circle here?

@larsoner
Copy link
Member

larsoner commented Feb 2, 2024

Looks like for some reason it ran on his fork rather than our project. Not sure why

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Feb 2, 2024

Looks like for some reason it ran on his fork rather than our project. Not sure why

https://support.circleci.com/hc/en-us/articles/360008097173-Troubleshooting-why-pull-requests-are-not-triggering-jobs-on-my-organization

Is this the reason why? @larsoner @drammock

A user who submits a pull request to your repository from a fork, but no pipeline is triggered with the pull request. This can happen when the user is following the project fork on their personal account rather than the project itself on CircleCI.

In these cases, the user unfollows their fork of the project on CircleCI. This will trigger their jobs to run under the organization when they submit pull requests. Those users can optionally follow the source project if they wish to see the pipelines.

I clicked unfollow.
image

@drammock
Copy link
Member

drammock commented Feb 2, 2024

I clicked unfollow.

thanks! hopefully that will fix it for your next PR.

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Feb 2, 2024

I had some problems before where CircleCI was not running because it said I was an unregistered user. I think that's why I clicked follow. Do you know how to fix this? @drammock

@drammock
Copy link
Member

drammock commented Feb 2, 2024

I think what is supposed to happen is that if you create a CircleCI account that is linked to your GH account (maybe they always are?) then Circle will run the job automatically as long as it's not your first PR in the MNE-Python repo. But to be honest I haven't really had time to look into the situation / failures very carefully. Maybe I should create a new GH account so I can try it out / experience the rejection myself, and then document the fix 😆

@withmywoessner
Copy link
Contributor Author

Thanks! @drammock I thought I had done that.
image
Looking back at my last PR #12192 @larsoner still had to run the test for me because I got the block-unregistered-user error. Do I need to be a part of the mne-python organization in CircleCI?
image

@drammock
Copy link
Member

drammock commented Feb 2, 2024

I don't know :( I just poked around in the admin interface and I see no list of "members" and no way to "add" people... your list of organizations looks odd though (mine lists various user's names with images, and various named orgs, but no hash-like codes like cci-3rq7p like yours does). Do you see a link at the bottom of that dropdown saying "Can't find an organization? Check permissions"

@withmywoessner
Copy link
Contributor Author

withmywoessner commented Feb 2, 2024

Yes,
image
I click that and it takes me here:
image
I click, 'check permissions...' and it takes me:
image

Is there anything linking your CircleCI account to the mne-python public repo on your CircleCI account page? I don't see anything like that on mine.

Thanks for looking into this! @drammock

@drammock
Copy link
Member

drammock commented Feb 2, 2024

ok, IIUC they are using GitHub users & organizations as their source of "organizations" and since you're not a member of https://github.com/orgs/mne-tools you shouldn't expect to see "mne-tools" as an option within CircleCI. But since I am a member, I do see "mne-tools" as an org within CircleCI (along with other orgs I'm part of: pydata, sphinx-gallery, etc).

So I think that part at least is working as expected? (I don't see any orgs listed on your GitHub profile so it makes sense you wouldn't see any within CircleCI). But I think (?) this shouldn't prevent the jobs from running; hopefully it was the "follow" setting that you've just changed; we'll know on your next PR I guess.

@withmywoessner
Copy link
Contributor Author

Hopefully, @drammock 😬. Is there a way to change where CircleCI is run from in an existing PR?
Looking at my open PR #12392. I just merged the branch and it still says the tests are being run from my branch:
image

Compared to this one from #12382:
image

It also has a different icon in the top left and says 'mne-tools' when I clicked on it.

@drammock
Copy link
Member

drammock commented Feb 2, 2024

#12392 looks correct to me:

Screenshot 2024-02-02 at 14-51-13 build_docs (63616) - mne-tools_mne-python

not sure why you're seeing something different. To get to that page I clicked on the "Details" link on #12392:

Screenshot 2024-02-02 at 14-52-40 Create functions to read Epochs and Evoked Neuroscan formats by withmywoessner · Pull Request #12392 · mne-tools_mne-python

@withmywoessner
Copy link
Contributor Author

It looks good now. Maybe I just clicked it too early. Thanks!

snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants