Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw.crop doesn't adjust onset of annotations for eyetracking data leading to interpolation errors #12408

Open
schekroud opened this issue Feb 1, 2024 · 10 comments · May be fixed by #12415
Open
Labels

Comments

@schekroud
Copy link

Description of the problem

specifically for eyetracking data (not sure about other types), the raw.crop method will trim the signal down appropriately, but no adjustment is made to the onset time of any annotations that remain in the data. this is particularly problematic when you use mne.preprocessing.eyetracking.interpolate_blinks where it will just interpolate over random segments of the signal

i am not sure if this is an intended behaviour as I think, in the past, the .crop function has adjusted the onset time of events (before annotations were used). If i'm wrong, please send me away!

Steps to reproduce

import mne
from mne.datasets.eyelink import data_path
mne.viz.set_browser_backend('qt')

dpath = data_path() / "eeg-et" / "sub-01_task-plr_eyetrack.asc"
raw = mne.io.read_raw_eyelink(dpath, create_annotations = ['blinks'])
raw.plot(scalings = dict(eyegaze = 1e3))
raw = raw.crop(tmin = 4)
raw = mne.preprocessing.eyetracking.interpolate_blinks(raw, buffer = 0.15)
raw.plot(scalings = dict(eyegaze = 1e3)) #can see interpolation happens exactly 4s after the actual blink period

Link to data

No response

Expected results

it should interpolate the blinks (removing blink artefact)

Actual results

it interpolates exactly Xs after the blink, where X is the tmin applied in the raw.crop call

Additional information

Platform macOS-14.2.1-arm64-arm-64bit
Python 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:37:07) [Clang 15.0.7 ]
Executable /Users/sammichekroud/anaconda3/envs/mne/bin/python
CPU arm (8 cores)
Memory 16.0 GB

Core
├☒ mne 1.6.0 (outdated, release 1.6.1 is available!)
├☑ numpy 1.26.2 (OpenBLAS 0.3.25 with 8 threads)
├☑ scipy 1.11.4
├☑ matplotlib 3.8.2 (backend=Qt5Agg)
├☑ pooch 1.8.0
└☑ jinja2 3.1.2

Numerical (optional)
├☑ sklearn 1.3.2
├☑ numba 0.58.1
├☑ nibabel 5.2.0
├☑ nilearn 0.10.2
├☑ dipy 1.7.0
├☑ openmeeg 2.5.7
├☑ pandas 2.1.4
└☐ unavailable cupy

Visualization (optional)
├☑ pyvista 0.43.1 (OpenGL 4.1 Metal - 88 via Apple M2)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.2.6
├☑ qtpy 2.4.1 (PyQt5=5.15.2)
├☑ pyqtgraph 0.13.3
├☑ mne-qt-browser 0.6.1
├☑ ipywidgets 8.1.1
├☑ trame_client 2.14.1
├☑ trame_server 2.13.1
├☑ trame_vtk 2.6.3
├☑ trame_vuetify 2.3.1
└☐ unavailable ipympl

Ecosystem (optional)
└☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline

@schekroud schekroud added the BUG label Feb 1, 2024
Copy link

welcome bot commented Feb 1, 2024

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴

@schekroud
Copy link
Author

schekroud commented Feb 1, 2024

from looking at the source code for raw.crop, it looks like this is because the annotations created by mne.io.read_raw_eyelink come with an orig_time feature. With this feature the onsets of the annotations aren't actually adjusted to be relative to the first sample of the signal

i should say that cropping then adjusting the onset of each annotation will make it so that you can interpolate over the correct area of the signal, but there's then a big mismatch in the actual location of the blink:

# raw.plot(scalings = dict(eyegaze = 1e3))
raw = raw.crop(tmin = 2.0)
#adjust the onset of annotations accordingly
for x in range(len(raw.annotations)):
    raw.annotations.onset[x] -= 2.0

raw = mne.preprocessing.eyetracking.interpolate_blinks(raw, buffer = [0.05,0.2])
raw.plot(scalings = dict(eyegaze = 1e3)) #can see interpolation happens exactly 2s after the actual blink period, in the correct spot

the correct portions of the signal will now be interpolated, but when you actually plot the data you see that the annotations no longer line up with the truth of the signal. Something weird is going on for sure

@scott-huberty
Copy link
Contributor

scott-huberty commented Feb 1, 2024

Thanks for the report!

I think we need to account for raw.first_time in the interpolate_blinks function, which will be non-negative after you call raw.crop:

def _interpolate_blinks(raw, buffer, blink_annots, interpolate_gaze):
"""Interpolate eyetracking signals during blinks in-place."""
logger.info("Interpolating missing data during blinks...")
pre_buffer, post_buffer = buffer
# iterate over each eyetrack channel and interpolate the blinks
for ci, ch_info in enumerate(raw.info["chs"]):
if interpolate_gaze: # interpolate over all eyetrack channels
if ch_info["kind"] != FIFF.FIFFV_EYETRACK_CH:
continue
else: # interpolate over pupil channels only
if ch_info["coil_type"] != FIFF.FIFFV_COIL_EYETRACK_PUPIL:
continue
# Create an empty boolean mask
mask = np.zeros_like(raw.times, dtype=bool)
for annot in blink_annots:
if "ch_names" not in annot or not annot["ch_names"]:
msg = f"Blink annotation missing values for 'ch_names' key: {annot}"
raise ValueError(msg)
start = annot["onset"] - pre_buffer
end = annot["onset"] + annot["duration"] + post_buffer
if ch_info["ch_name"] not in annot["ch_names"]:
continue # skip if the channel is not in the blink annotation
# Update the mask for times within the current blink period
mask |= (raw.times >= start) & (raw.times <= end)
blink_indices = np.where(mask)[0]
non_blink_indices = np.where(~mask)[0]

starting at line 89 above, something like:

        for annot in blink_annots:
            if "ch_names" not in annot or not annot["ch_names"]:
                msg = f"Blink annotation missing values for 'ch_names' key: {annot}"
                raise ValueError(msg)
            start = annot["onset"] - raw.first_time - pre_buffer   # critical change here
            end = annot["onset"] - raw.first_time + annot["duration"] + post_buffer  # critical change here

After which, running your code above, should still interpolate the blinks as expected:

Screen Shot 2024-02-01 at 2 45 14 PM

@schekroud
Copy link
Author

neat! Thanks for the quick fix - would it be ok to change my pupillometry.py to include that fix until it's released?

also, possibly not the right place to mention this as it's less of an issue and more of a feature request - would it be possible to have this function return the non-interpolated, but still nanned signals? this is actually quite useful when you want to see how much data in a trial is missing (after epoching the raw data, it works well with epoched.plot_image) to aid in trial rejection for later analysis. If useful, i am v happy to be pointed to wherever is good for feature requests to add this!

@scott-huberty
Copy link
Contributor

neat! Thanks for the quick fix - would it be ok to change my pupillometry.py to include that fix until it's released?

@schekroud would you be up to submit a PR with the fix? If we get stuck somewhere (our change breaks existing tests, etc.) I'll be happy to help troubleshoot as well.

also, possibly not the right place to mention this as it's less of an issue and more of a feature request - would it be possible to have this function return the non-interpolated, but still nanned signals?

Like return a numpy array of the signals before interpolation? Sorry if I'm misunderstanding, but feel free to open up a new ticket if its something you want to discuss more!

@larsoner
Copy link
Member

larsoner commented Feb 2, 2024

I think we need to account for raw.first_time in the interpolate_blinks function, which will be non-negative after you call raw.crop

To fix this rather than any new code I think the solution is to use _annotations_starts_stops

https://github.com/mne-tools/mne-python/blob/main/mne/annotations.py#L1049

Internally it uses _sync_onset to deal with all the first_samp stuff the right way

Like return a numpy array of the signals before interpolation?

If this is all you need raw["eyegaze"] or raw.get_data(...) or similar should get you what you want

@scott-huberty
Copy link
Contributor

scott-huberty commented Feb 2, 2024

To fix this rather than any new code I think the solution is to use _annotations_starts_stops

OK, using _annotations_starts_stops an implementation could look like (again starting from line 89 above):

starts, ends = _annotations_starts_stops(raw, "BAD_blink")  # New line
starts = np.divide(starts, raw.info["sfreq"])    # New line
ends = np.divide(ends, raw.info["sfreq"])   # New line
for annot, start, end in zip(blink_annots, starts, ends):   # Edited line
    if "ch_names" not in annot or not annot["ch_names"]:
       msg = f"Blink annotation missing values for 'ch_names' key: {annot}"
       raise ValueError(msg)
       start -= pre_buffer   # Edited line
       end += post_buffer   # Edited line
       ...

@schekroud
Copy link
Author

@scott-huberty i'll submit the PR later on, sure

If this is all you need raw["eyegaze"] or raw.get_data(...) or similar should get you what you want

not quite. i'm talking about adding a channel during mne.io.preprocessing.interpolate_blinks where the channel contents are basically the output of

# signal is your trace
# mask is the np.zeros_like(signal) with boolean for if the sample is to be interpolated
signal[np.squeeze(np.where(mask))] = np.nan
#append signal as new channel called e.g. pupil_nan

i'll open a new issue with feature request to explain a bit more - but it is specifically quite useful for eyetracking to identify 'blinky' subjects that you may want to discard as you're interpolating large %s of data

@larsoner
Copy link
Member

larsoner commented Feb 2, 2024

i'll open a new issue with feature request to explain a bit more - but it is specifically quite useful for eyetracking to identify 'blinky' subjects that you may want to discard as you're interpolating large %s of data

Sure, feel free. Maybe what we really need is a function to quantify how much of a given raw instance is "covered" by annotations of a specific type. In your case it could be something like mne.annotations.quantify_coverage(raw, "BAD_BLINK") or something similar.

@scott-huberty
Copy link
Contributor

scott-huberty commented Feb 2, 2024

@scott-huberty i'll submit the PR later on, sure

awesome!

Maybe what we really need is a function to quantify how much of a given raw instance is "covered" by annotations of a specific type. In your case it could be something like mne.annotations.quantify_coverage(raw, "BAD_BLINK") or something similar.

if so, maybe we can just repurpose some of the code used for logging in mne.preprocessing.annotate_break into a function:

n_breaks = len(break_annotations)
break_times = [
f"{o:.1f}{o+d:.1f} s [{d:.1f} s]"
for o, d in zip(break_annotations.onset, break_annotations.duration)
]
break_times = "\n ".join(break_times)
total_break_dur = sum(break_annotations.duration)
fraction_breaks = total_break_dur / raw.times[-1]
logger.info(
f"\nDetected {n_breaks} break period{_pl(n_breaks)} of >= "
f"{min_break_duration} s duration:\n {break_times}\n"
f"In total, {round(100 * fraction_breaks, 1):.1f}% of the "
f"data ({round(total_break_dur, 1):.1f} s) have been marked "
f"as a break.\n"

@schekroud schekroud linked a pull request Feb 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants