Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container for eye tracking related annotation information #12208

Open
eioe opened this issue Nov 14, 2023 · 32 comments · May be fixed by #12213
Open

container for eye tracking related annotation information #12208

eioe opened this issue Nov 14, 2023 · 32 comments · May be fixed by #12213
Labels
ENH sprint-2023 Issues reserved for the 2023 Intermediate Dev Training

Comments

@eioe
Copy link
Contributor

eioe commented Nov 14, 2023

Describe the new feature or enhancement

Currently, there is no dedicated way to store details about eye tracking related annotations. Several properties of saccades (e.g., peak velocity, start+end position, ...) and fixations (e.g., avg position) are useful information downstream. (This info can/will hopefully in future be generated by a [to be implemented] ET analysis module or [for now] be read from the Eyelink output file.)

Atm, annotations only allow to store onset, duration, description, (and orig_time) as they need to be generic. So extending Annotations with additional (eye tracking specific) fields is not an option.
For reference: EYE-EEG stores {'latency', 'duration', 'sac_vmax', 'sac_amplitude', 'sac_angle', 'epoch', 'sac_startpos_x', 'sac_startpos_y', 'sac_endpos_x', 'sac_endpos_y'} for saccades and {'latency', 'duration', 'fix_avgpos_x', 'fix_avgpos_y', 'epoch', 'sac_vmax', 'sac_amplitude', 'sac_angle', 'sac_startpos_x', 'sac_startpos_y', 'sac_endpos_x', 'sac_endpos_y'} for fixations (Code).

Describe your proposed implementation

[1] With @dominikwelke, @mscheltienne , @sappelhoff we discussed having a separate structure (probably a dict) that holds the according info and can be indexed with a unique identifier linked to a given annotation object. This identifier could be a hash formed by combining onset, duration, and description.

[2] Ideally, the information can still be linked once the annotations are transferred into events. I guess, the event_dict route could be extended accordingly but have not thought this through or discussed with others.

Describe possible alternatives

As discussed above, extending the Annotation class does not seem a viable alternative.

Alternatively, we can leave it to the user as the metrics can theoretically be calculated from the gaze data on the fly. This, however, seems to be tedious and non-trivial (esp., for parameters like peak velocity).

Additional context

@scott-huberty (and @larsoner ): do you have ideas/opinions regarding this? As you have implemented the bulk of the existing functionality regarding eye tracking data and probably already thought about similar questions. Thx!

@eioe eioe added the ENH label Nov 14, 2023
@eioe
Copy link
Contributor Author

eioe commented Nov 14, 2023

@drammock , just FYI: I'm working on this in the context of the sprint but I don't think I can label it accordingly myself.

@sappelhoff sappelhoff added the sprint-2023 Issues reserved for the 2023 Intermediate Dev Training label Nov 14, 2023
@agramfort
Copy link
Member

agramfort commented Nov 14, 2023 via email

@eioe
Copy link
Contributor Author

eioe commented Nov 14, 2023

👍 ahja nice, metadata (which I haven't worked with yet) sound like a viable round for [2] (ie., more detailled info on events).

challenge for raw data remains.

@mscheltienne
Copy link
Member

In the eyetracking analysis module, I would have functions which computes a given metric for an eyetracking event, e.g.

def compute_saccade_velocity(raw: BaseRaw) -> dict[Annotation, float]:
    pass

which takes in input a Raw object with the annotations on which the metric must be computed and returns a mapping between the annotation and the metric.

And then we can have higher level function/method/objects which will use those metric-computation function automatically and fill e.g. the epochs metadata.

@eioe
Copy link
Contributor Author

eioe commented Nov 14, 2023

looks nice. thx @mscheltienne. However, don't we "waste energy" this way? Normal procedure is that whatever determines where a saccades is, at the same time returns the saccade parameters.
If I understand your approach correctly, we would "recalculate" the parameters for an already determined annotation. Or am I misreading you?

How about an approach like:

sacc_dict = detect_saccades(raw) 

which annotates raw and returns a dict[Annotation, SaccParams] ?

@mscheltienne
Copy link
Member

This info can/will hopefully in future be generated by a [to be implemented] ET analysis module or [for now] be read from the Eyelink output file.

I was applying this sentence, implementing analysis functions which compute those metrics, regardless of the eyetracking device.

@eioe
Copy link
Contributor Author

eioe commented Nov 14, 2023

yes, but is in your scenario the raw which is passed to compute_saccade_velocity() already annotated with gaze events?

@mscheltienne
Copy link
Member

Yes, the events should be present in the raw, in the form of annotations. Did I misunderstood something here?
IMO, those events can come from 2 sources: the reader or from compute_function which take in input a Raw with eyegaze or pupil channels.

@drammock
Copy link
Member

drammock commented Nov 14, 2023

Let's try to solve one problem at a time.

  1. EyeLink data files include some of derived values of saccades and fixations, and we would like to (a) read them in, and (b) store them somewhere for later use
  2. There may be other measures not included in EyeLink files that we want to later calculate and store (or equivalently, other eyetracker systems that don't pre-calculate the values that EyeLink data format includes)

I think it is possible to solve (1) first and independently of (2), as long as we bear in mind that the values included in .edf files may not be the full set of values we care about. In other words, once we have a storage plan, we can then later decide how we populate that storage in cases where the values aren't coming from the data file.

As noted above, metadata dataframes are at present only supported for Epochs objects, and Annotations objects only have 4 fields (onset, duration, description, and channels). So logical options are:

  1. add ability to attach metadata dataframe to Raw. This seems not too bad to implement; I'm imagining some subset of the following columns: [onset_sample/onset_time, n_samples/duration, measure, value], where "measure" would take values like sacc_peak_vel or sacc_onset_angle and "value" is the corresponding numeric value. This would be "long format" data so if each saccade has 3 associated measures then there would be 3 rows for that saccade. The downside to long-format is that value must either always be float, or be an object-dtype column (which gets messy)... wide-format may be better if we leverage pandas' nullable dtypes for any string or integer columns). The big question in my mind is: if you assume that the metadata dataframe exists as just described, how you envision using that dataframe? Do you need additional helper functions? Does it need to persist when converting Raw to Epochs? Or do you just grab a copy of it and use pandas or seaborn to do whatever derivative computations you want, alongside (but separate from) the raw -> epochs -> evoked analyses?

  2. allow Annotations to have arbitrary extra fields. A basic choice is whether we allow top-level extra fields or whether we hide them all one level deep in something like Annotations.userdata or something like that. This feels like easiest to implement, but harder to use than the metadata dataframe. This plus a helper function like make_dataframe_from_annotation_userdata might be good enough, depending on the answers to the intended-usage questions in (1) above.

  3. shoe-horn the derived values into the existing Annotation description. This is bad because it requires storing name-value pairs as strings and then parsing them later to get at the numeric values. It also requires either multiple coincident annotations (one for each derived measure) --- which while maybe tractable in code, will be a nightmare to visualize during raw.plot() --- or a single annotation whose description contains multiple key-value pairs that need to be parsed from strings. This feels both fragile and also a possible security concern if we do the parsing carelessly.

@scott-huberty
Copy link
Contributor

EyeLink data files include some of derived values of saccades and fixations, and we would like to (a) read them in, and (b) store them somewhere for later use

I think read_raw_eyelink does parse the aforementioned saccadic metadata from file and stores it in a dataframe, but we only end up extracting the onset and duration for the annotations. Once we figure out (b), we could refactor this code to store that saccadic metadata in the right place.

As for (b) I'm slightly partial to the metadata dataframe proposal. I think if I were to try to use this metadata in an analysis, this method would be a lot easier to work with (for example I could merge my stimulus onset info into the dataframe, and look at amplitude of saccades within 1-second of stimulus onset).

My other thought is that annotations have a visualization component, and I don't know how we would visualize a userdata field, making the field kind of hidden?

Happy to hear others thoughts.

@eioe
Copy link
Contributor Author

eioe commented Nov 15, 2023

@drammock , great summary/battle plan. Thx! 🙏

EyeLink data files include some of derived values of saccades and fixations, and we would like to (a) read them in, and (b) store them somewhere for later use

As @scott-huberty mentioned and as far as I have seen, at present the reading (for .asc files) is already implemented, so we're mostly missing the storage part.

Regarding (b):

shoe-horn the derived values into the existing Annotation description. This is bad ...

I agree.

allow Annotations to have arbitrary extra fields.

I have not worked much enough with annotations (esp. their visualization) to have a strong opinion here. But I see your concerns. Also, I find Annotation.userdata.peak_velocity not very intuitive, tbh, as userdata could be anything.

add ability to attach metadata dataframe to Raw

tbh, I have some concerns also here:
a) Long format breaks with the logic of 1 row per epoch/event in a metadata DF which is in-place for metadata on Epochs atm. sure, metadata on Rawmay behave differently as there are no epochs, but we can interpret saccades/fixations as events and each of these would immediately have multiple rows in the metadata. At latest when epoching, we'd need to convert metadata to a different (wide) shape to adhere to the existing rule.
b) Related to a), I interpret the matter in a way that a saccade/fixation has a series of properties which define (onset_time, duration) and characterize (e.g., peak_velocity, pos_start_x, ...) it. This structure would be represented in the long format DF i a way as well, as the "grouping"/indexing would be done via [onset_time, duration]. However, would we want to add the information which kind of event it is (saccade or fixation)? Or is it good enough that this can be inferred from the types of values present (i.e., it's a fixation if fix_avgpos_x) is present?

Let me sketch the use cases which I have in mind:
[1] Fixation related potentials (FRP) dependent on fixation target:
I want to epoch my data to fixation onsets and then compare ERPs in epochs in which a face was fixated vs fixations on a car. The data stems from a free viewing experiment in which people were exploring a given scene. Positions of faces and cars in the scene are given as (x,y) tuples (same coord system as my ET data). So I'd want to filter my Annotations for those with .description == "fixation" and then further identify those for which any([dist(('fix_avgpos_x', 'fix_avgpos_y'), loc] < size_face for loc in face_locations]. Likewise for cars. Then I can easily convert these into events and use these for epoching.

[2] Plot FRP as a function of the size of the incoming saccade:
I'd want to epoch using all events/annotations with .description == "fixation" and then somehow be able to access for each of these epochs sac_amplitude (which, in the logic of EYE-EEG, see above) refers to the previous saccade.

[3] use gaze event properties to discard independetly epoched data:
I epoch my data using stimulus onset events, and then want to discard all trials for which there was a saccade with an amplitude larger than sacc_threshold within the first second. (I think, this is somewhat close to what @scott-huberty described). If I do the epoching first (and somehow the gaze event data is passed along), there could now also be multiple saccade "events" within my epoch.

Just putting this out for thought.
I'm digging a bit deeper into the existing functionality/logic of metadata for now, to have a better grasp how much it could be of help for these scenarios.

@eioe
Copy link
Contributor Author

eioe commented Nov 15, 2023

update:
after discussing this with @britta-wstnr and @mscheltienne , here's the current state of affairs:
[for the sake of simplicity, I will just write about saccades, but the same ofc applies to fixations and potentially blinks and other events maybe even unrelated to eye data.]

If saccade info lives in raw.metadata or even in a separate object it is challenging to make sure that it stays linked/synced with the according annotation object (i.e., in case of resampling, data rejection, ...). This challenge does not exist, when the data lives on the annotation object which needs to be handled/updated by MNE anyways.
We therefore converged towards option "2." as of in @drammock 's post above, that is to store the data in a custom structure, hidden under .userdata (or so) on the annotations object.
This structure

  1. should be a private property which is not to be touched by/exposed to the user but shall mostly be handled by functions.
  2. can be filled either from existing data (e.g. read in from file) or from MNE functions which (re)calculate the according features.
  3. should implement a specific interface, dependent on whether it is a "saccade", "fixation" ,...
  4. can (in the future) also be used for other, non-eye related information which shall be passed along with an annotation.
  5. shall be igbored by anything related to plotting.

when epoching the data (either by using saccade events or other events) the saccade info can be passed along to the Epochs by making use of the existing functionalities for metadata. For this, however, we will have to make sure how to pass along the info from annotation.userdata to the event based logic.

Any input/opinions from @drammock or @larsoner are welcome.
@britta-wstnr & @mscheltienne pls feel free to comment if I forgot or misunderstood something from our discussion.

I'm off for lunch now and would afterwards start working on a PR suggesting the introduction of .userdata on annotations.

Sidenote: @mscheltienne mentioned that we should introduce a "locking" feature/flag for annotations which makes sure that these annotations cannot be (accidentally) moved manually be the user. This also applies to saccade annotations which should never be moved manually. Definitely a separate issue though.

@mscheltienne
Copy link
Member

I would make this attribute private, and maybe call it metadata rather than userdata since the goal is to 'hide' it from the user perspective: annotation._metadata.
For the I/O roundtrip, I think we can get it to work depending on the structure stored in _metadata, after all the annotation I/O already feels like a bit of a hack:

mne-python/mne/annotations.py

Lines 1088 to 1104 in 9b57c51

def _write_annotations(fid, annotations):
"""Write annotations."""
start_block(fid, FIFF.FIFFB_MNE_ANNOTATIONS)
write_float(fid, FIFF.FIFF_MNE_BASELINE_MIN, annotations.onset)
write_float(
fid, FIFF.FIFF_MNE_BASELINE_MAX, annotations.duration + annotations.onset
)
write_name_list_sanitized(
fid, FIFF.FIFF_COMMENT, annotations.description, name="description"
)
if annotations.orig_time is not None:
write_double(fid, FIFF.FIFF_MEAS_DATE, _dt_to_stamp(annotations.orig_time))
if annotations._any_ch_names():
write_string(
fid, FIFF.FIFF_MNE_EPOCHS_DROP_LOG, json.dumps(tuple(annotations.ch_names))
)
end_block(fid, FIFF.FIFFB_MNE_ANNOTATIONS)

@drammock
Copy link
Member

  • I think I'm convinced that the convenience of a DataFrame is outweighed by the need to keep synchronized (which is much easier when storing on Annotations)
  • I like @mscheltienne's suggestion of annotations._metadata.
  • I agree that in most cases it would be helpful / prudent to somehow "lock" these annotations from having their start/end times altered by the user... but on the other hand what if EyeLink mis-calculates a value and the user wants to correct it slightly based on observation of the xpos, ypos, or pupil channels? In other words, we should probably have saccade/fixation (and maybe blink?) annotations locked by default but think about whether and how to let users unlock them. Could be done in a separate PR.

@eioe eioe linked a pull request Nov 15, 2023 that will close this issue
@eioe
Copy link
Contributor Author

eioe commented Nov 15, 2023

If someone of the core-devs could take a look at the PR, to see whether this is roughly what you had in mind, I'd be thankful (before I refactor more).

@hoechenberger
Copy link
Member

@drammock Wide format is what we use for epochs metadata, so I'd be in favor of using it for Raw as well

@drammock
Copy link
Member

If someone of the core-devs could take a look at the PR, to see whether this is roughly what you had in mind, I'd be thankful (before I refactor more).

We (@eioe, @britta-wstnr and I) discussed this live today. The tentative plan is:

  • annotation spans (i.e., my_annot[0], which is an OrderedDict) will have a new key _metadata which is a Python object (called AnnotMetadata or so), that is initialized with a kind string ("saccade", "fixation", "blink", etc).
  • Depending on the kind, the _metadata object will get a read-only property of either .saccade, .fixation etc.
  • That property will also be a Python object, with read-only properties that are relevant for the kind (e.g., .saccade.peak_vel, .saccade.start_pos, .saccade.end_pos, .fixation.mean_pos, etc).
  • the values for those properties will only be settable by helper functions (for now at least).
  • there will not be a new metadata parameter in the Annotations constructor (which is user-facing).
  • the container Annotations object (which can contain multiple annotation spans) will store the metadata as a list of AnnotMetadata objects (and perhaps Nones for spans that do not have metadata?)

@eioe and @britta-wstnr please correct anything I've mis-remembered.

@drammock
Copy link
Member

@hoechenberger

Wide format is what we use for epochs metadata, so I'd be in favor of using it for Raw as well

Ok, but the consensus seems to be to not use DataFrames at all:

I'm convinced that the convenience of a DataFrame is outweighed by the need to keep synchronized (which is much easier when storing on Annotations)

@hoechenberger
Copy link
Member

hoechenberger commented Nov 15, 2023

@hoechenberger

Wide format is what we use for epochs metadata, so I'd be in favor of using it for Raw as well

Ok, but the consensus seems to be to not use DataFrames at all:

I'm convinced that the convenience of a DataFrame is outweighed by the need to keep synchronized (which is much easier when storing on Annotations)

Yeah, I didn't understand what was meant by that 😅 Or rather, I thought the solution to that was to attach the metadata to Annotations – in whatever form we choose? Couldn't we choose a DataFrame, then? I would even go so far as to say, why not store all Annotations + metadata in a DataFrame while keeping the Annotations API the same (e.g. Annotations.onset would return Annotations._df["onset"].values)
Just some random thoughts

@eioe
Copy link
Contributor Author

eioe commented Nov 15, 2023

Thx @drammock for the summary of our discussion earlier today. I've just seen this now (🙈) after we talked and slightly changed plans again.

Here's roughly the new idea:

from mne.utils import _validate_type
from dataclasses import dataclass, fields


@dataclass(frozen=True)
class SaccadeAnnotMetadata:
  sac_vmax: float = None
  sac_amplitude: float = None
  sac_angle: float = None
  sac_startpos_x: float = None
  sac_startpos_y: float = None
  sac_endpos_x: float = None
  sac_endpos_y: float = None

  def __post_init__(self):
    for field in fields(self):
      _validate_type(getattr(self, field.name), float)
    

@dataclass(frozen=True)
class FixationAnnotMetadata:
  fix_avgpos_x: float = None
  fix_avgpos_y: float = None
  incoming_saccade: SaccadeAnnotMetadata = None

  def __post_init__(self):
    _validate_type(self.fix_avgpos_x, float)
    _validate_type(self.fix_avgpos_y, float)
    _validate_type(self.incoming_saccade, SaccadeAnnotMetadata)

Some thoughts:

  • We can use immutable dataclasses, as they only need to be filled on creation (e.g., when parsed from a file or calculated by some helper function); the ydo not need to be changed after creation. This implements the "readonly" idea that @drammock mentioned above.
  • There does not seem to be a nice way to use dataclasses without type annotations. I know, they are not very common in the MNE codebase—but are also used in other places where dataclasses are already in use.

@eioe
Copy link
Contributor Author

eioe commented Nov 16, 2023

@hoechenberger

why not store all Annotations + metadata in a DataFrame while keeping the Annotations API the same

I think, one reason that speaks against this is that different annotations may have different fields in their metadata (or no metadata at all) which would make Annotations._df a pretty sparse DF.

@eioe
Copy link
Contributor Author

eioe commented Nov 16, 2023

ok, here's another update.
After a loooong discussion with @britta-wstnr and @wmvanvliet , we decided (? ehehe) that adding ._metadata to annotations might at the end not be the route to go. (Mostly bc of concerns that users will get tempted to start stuffing other things into this field which will lead to bad things esp bc annotations are written to .fif.)
The new idea is to use a private ._id attribute on each annotation which has a unique identifier that can be used to link the annotation to metadata which lives in a separate object.

Here's an examplary workflow:
(for this to work, you need to checkout my latest changes made in #12213 811d387
which implements the ._id attribute.)

import numpy as np
import pandas as pd

import mne
from mne.annotations import Annotations
from mne.utils import _validate_type



class EyeEvents:
  def __init__(self, annots=None, event_data=None):
    self._fixations = pd.DataFrame(
      columns=['annot_id', 'fix_avgpos_x', 'fix_avgpos_y', 'incoming_saccade_id']
    )
    self._saccades = pd.DataFrame(
      columns=['annot_id', 'sac_vmax', 'sac_amplitude', 'sac_angle',
               'sac_startpos_x', 'sac_startpos_y', 'sac_endpos_x',
               'sac_endpos_y']
    )

    if (annots is not None):
      _validate_type(annots, Annotations)
      # make sure that we have the same number of annotations as event_data
      if len(annots) != len(event_data):
        raise ValueError(f'annots and event_data must have the same number ' +
                         f'of annotations. {len(annots)} != {len(event_data)}')
      for annot, event_datum in zip(annots, event_data):
        self.add_annot(annot, event_datum)

  def add_annot(self, annot, event_data):
    if annot['description'] == 'fixation':
      # make sure that we have the correct keys in event_data
      if np.all(np.isin([str(k) for k in event_data.keys()],
                        self._fixations.columns)):
        event_data['annot_id'] = annot['_id']
        tmp_df = pd.DataFrame([event_data])
        self._fixations = pd.concat([self._fixations, tmp_df])
      else:
        raise ValueError(f'event_data has incorrect keys. Fixation events ' +
                         f'have columns: {self._fixations.columns} but ' +
                         f'event_data has keys: {event_data.keys()}.')

    elif annot['description'] == 'saccade':
      # make sure that we have the correct keys in event_data
      if np.all(np.isin([str(k) for k in event_data.keys()],
                        self._saccades.columns)):
        event_data['annot_id'] = annot['_id']
        tmp_df = pd.DataFrame([event_data])
        self._saccades = pd.concat([self._saccades, tmp_df])
      else:
        raise ValueError(f'event_data has incorrect keys. Saccade events ' +
                         f'have columns: {self._saccades.columns} but ' +
                         f'event_data has keys: {event_data.columns}.')

    else:
      raise ValueError('Invalid annotation description. EyeEvents can only ' +
                       'handle annotations with description "Fixation" or ' +
                       '"Saccade".')
    

if __name__ == '__main__':

  # Dummy saccade data
  sac_vmax = np.array([0.5, 1, 2, 3, 4]) * 123
  sac_amplitude = np.array([4, 3, 2, 1, 0]) * 10
  sac_angle = np.array([0, 0.2, 0.4, 0.6, 0.8]) * 360
  sac_startpos_x = np.array([0, 1, 2, 3, 4])
  sac_startpos_y = np.array([0, 1, 2, 3, 4])
  sac_endpos_x = np.array([0, 1, 2, 3, 4]) + 10
  sac_endpos_y = np.array([0, 1, 2, 3, 4]) - 10

  # Dummy fixation data
  fix_pos_x = sac_endpos_x
  fix_pos_y = sac_endpos_y

  saccade_eventdata = [{'sac_vmax': vmax, 'sac_amplitude': amp,
                        'sac_angle': ang, 'sac_startpos_x': sx,
                        'sac_startpos_y': sy, 'sac_endpos_x': ex,
                        'sac_endpos_y': ey} for vmax, amp, ang, sx, sy, ex, ey 
                        in zip(sac_vmax, sac_amplitude, sac_angle, 
                               sac_startpos_x, sac_startpos_y, sac_endpos_x,
                               sac_endpos_y)]

  fixation_eventdata = [{'fix_avgpos_x': x, 'fix_avgpos_y': y } for x, y in
                        zip(fix_pos_x, fix_pos_y)]

  annots_sacc = Annotations(
    onset=[1, 5, 18, 33, 45],
    duration=np.ones(5),
    description=['saccade'] * 5
  )

  # We construct the fixation events so that they each occur after one of the 
  # saccades
  annots_fix = Annotations(
    onset=[sacc_on + sacc_dur for (sacc_on, sacc_dur) in zip(annots_sacc.onset,
                                                             annots_sacc.duration)],
    duration=np.ones(5),
    description=['fixation'] * 5
  )

  # We can now grab the IDs of the preceding saccad annotations and add them to the
  # fixation event data
  incoming_saccade_ids = annots_sacc._id
  for fix_ed, sacc_id in zip(fixation_eventdata, incoming_saccade_ids):
    fix_ed['incoming_saccade_id'] = sacc_id
  
  # Now we can combine the saccade and fixation annotations
  all_annot = annots_sacc + annots_fix
  event_data_concat = saccade_eventdata + fixation_eventdata   
  # However the annotations are automatically sorted by onset time, while 
  # the event data are just concatenated.
  # So we need to sort the event data accordingly to be able to link them. 
  # For this, we can use the IDs of the annotations:
  ids_events_concat = all_annot._get_id(
    np.hstack([annots_sacc.onset, annots_fix.onset]),
    np.hstack([annots_sacc.duration, annots_fix.duration]),
    np.hstack([annots_sacc.description, annots_fix.description]))
  
  event_data_sorted = []
  for id in all_annot._id:
    idx = np.where(np.array(ids_events_concat) == id)[0][0]
    event_data_sorted.append(event_data_concat[idx])

  eye_events = EyeEvents(annots_sacc + annots_fix, 
                         event_data_sorted)

  print(eye_events._saccades)
  print(eye_events._fixations)

Output:

                               annot_id  sac_vmax sac_amplitude  sac_angle sac_startpos_x sac_startpos_y sac_endpos_x sac_endpos_y
0  8c1598e1-9e0b-474e-9b7f-fdf8be98927c      61.5            40        0.0              0              0           10          -10
0  9fa0d848-b4e4-4be0-883c-e3a3518c0eb0     123.0            30       72.0              1              1           11           -9
0  8a51b1fd-3fae-4f86-a0e6-1a2490dfc1e7     246.0            20      144.0              2              2           12           -8
0  45315260-ff15-414d-864b-aa02019fe59c     369.0            10      216.0              3              3           13           -7
0  b012a5df-3563-4b2f-8627-090e1e4d8b6b     492.0             0      288.0              4              4           14           -6
                               annot_id fix_avgpos_x fix_avgpos_y                   incoming_saccade_id
0  d6e4c467-e31f-47d8-8951-35bdfc51a744           10          -10  8c1598e1-9e0b-474e-9b7f-fdf8be98927c
0  a2eaf980-3eb9-4d6e-8634-e004b91878bf           11           -9  9fa0d848-b4e4-4be0-883c-e3a3518c0eb0
0  7658a43d-8d93-4035-a1c2-01fb8ce613d9           12           -8  8a51b1fd-3fae-4f86-a0e6-1a2490dfc1e7
0  3613bbd2-f20a-4167-b0b5-ed5904152583           13           -7  45315260-ff15-414d-864b-aa02019fe59c
0  cc112a2a-e056-4243-9dce-9d727b5505e0           14           -6  b012a5df-3563-4b2f-8627-090e1e4d8b6b

@eioe
Copy link
Contributor Author

eioe commented Nov 16, 2023

@drammock uf you have a sec to give this a look, that'd be great. Esp if the ._id route is something that you can envision and is something worth pursuing tomorrow. Thx!

@drammock
Copy link
Member

@eioe and I just had another chat about this. Here's my position after that:

  • epochs metadata is at most one dataframe per epochs object. If we're going to have metadata dataframes for Raw objects too, I think there should likewise be at most one of them per Raw.
    • the "sparsity" of the dataframe is not a big usability problem, pandas has nice support for null values for most data types, which means it will be trivial to filter out rows that have an NA value for, e.g., sacc_peak_vel in order to end up with only saccade rows.
  • the link between the epochs object and the metadata rows is based on epoch number. The idea of using annotation number (AKA annotation ID, hash, whatever) has a nice parallel.
  • I don't think it's actually a very good idea to use a hash. Annotation objects are user-editable (they can change onset, duration, description, and channel association). That would presumably change the hash, making bookkeeping harder (would need to be recomputed and updated elsewhere).
    • (also, the hashes take up a lot of horizontal space in the dataframe columns)
    • I would lean toward something like an iterator attached to each Annotations instance that assigns new (numeric?) identifiers to each annotation "span" that gets added to it.

Things to think about:

  • someone should at least skim the draft BIDS eyetracking proposal, and if possible make our implementation somewhat aligned with it (this may be as simple as "choose the same column names they are proposing". Don't go too overboard trying to conform though, it's still a draft.
  • what happens to the metadata when annotations are removed/deleted from a raw object? do the associated metadata rows get deleted? My inclination is "metadata has exactly one row per annotation span" so if the annotations are removed, the metadata is also removed. This will probably be controversial.

please feel free to raise objections or point out problems that I'm missing.

@eioe
Copy link
Contributor Author

eioe commented Nov 17, 2023

thx @drammock

epochs metadata is at most one dataframe per epochs object.

as discussed, aesthetically unpleasant, but fine for me. Will also reduce redundancy in code (which might be more important then a densly filled DF). On it.
However, I plead for a different name than metadata as this is used in BIDS already for other info (BIDS metadata). IMO, consistency with Epochs.metadata does not justify inconsistency with BIDS. Open for opinions.
How do people feel about EyeEvents ?

I don't think it's actually a very good idea to use a hash.

Agree. That's why I suggested uuid, which is independent of the other attributes.

hashes take up a lot of horizontal space in the dataframe

That's also true for the uuid though. Enough of a reason to reject it?

I would lean toward something like an iterator attached to each Annotations instance that assigns new (numeric?) identifiers to each annotation "span" that gets added to it.

Yea, I thought about that and also @britta-wstnr had suggested that, as the numbers are smaller.
Downsides that I see:

  • it's pretty difficult to avoid that two Annotations objects hand out the same identifier to different spans. The link between a specific metadata object and an Annotation object does not need to be exclusive. Multiple Annotations instances could point to the same metadata. Then, however, they need to use different _ids. (Also in the case where an annotation is created which does not have any metadata. Or shall it not get an ._id then?) How do they know about each other? Possible, but a bit cumbersome.
  • similar thing when adding two annotations objects: with uuids we can just append their _id attributes. With numeric _ids we need to implement logic which makes sure that there are no duplicates and handle the according cases. Does that then mean also to update one of the associated metadata objects? Do we always need to check whether an Annotation object has associated metadata when merging it with another one?
  • tl;dr does the upside of numeric _ids truely outweigh its challenges in comparison to uuidss?

@eioe
Copy link
Contributor Author

eioe commented Nov 17, 2023

someone should at least skim the draft BIDS eyetracking proposal

I did now skim read the according PR. And I do not think that is something we need to worry much in the context which we are discussing here. The kind of metadata we have in mind for now (saccade, fixation properties, ...) would either count as derivatives or sourcedata in the BIDS language and therefore not underlie the conventiones of the BEP/new standard. Therefore, there also doe not seem to be (much/any) talk about how to name the properties. I think this is nothing that should restrict us here. Also confirmed by @sappelhoff .

My inclination is "metadata has exactly one row per annotation span" so if the annotations are removed, the metadata is also removed.

Yea, this will be controversial. I think this boils down to how much the object that we are talking about is metadata which is clearly associated with a given Raw (or derived Epochs) or how much it is its own entity which can be linked (in parts) to other objects but also has a meaning on its own. To be fair though, the only use case that comes to my mind at the moment: One might want to calculate/plot the main sequence (or other saccadic/fixational behavioral properties) also from data which has been rejected for EEG analyses. I do not have a hard opinion on this point. Just, if we want to keep annotations and metadata in sync, this needs to be realized and there is no obvious way in sight for this as the Annotations do not have a pointer to the metadata (if we are doing this via ._id). In this regard in bring us back to questions in my last post.

@hoechenberger
Copy link
Member

However, I plead for a different name than metadata as this is used in BIDS already for other info (BIDS metadata). IMO, consistency with Epochs.metadata does not justify inconsistency with BIDS. Open for opinions.

Hi, I'm for consistency within MNE, and here "metadata" (which is a super generic term anyway) has a different meaning than in BIDS. If we were talking about MNE-BIDS I would support your proposal for choosing a different name, but for MNE, "metadata" is the logical choice in my opinion

@mscheltienne
Copy link
Member

+1 for metadata as well.

@eioe
Copy link
Contributor Author

eioe commented Nov 17, 2023

ok, another lively discussion and pivot. 🤭 Main participants this time @britta-wstnr @drammock @larsoner .
Conclusions:

  • the info shall live in a new attribute on Annotations
  • this shall be a list of dict (or None), each with keys of kind str and values of kind str, float, or None (enforced as this makes sure we do not have problems writting it to disk)
  • it will be a public attribute and part of the API
  • we will offer something like an annotation_data_to_dataframe() function which allows to extract these data in an easy fashion

Open questions:

  • naming: the most favored option in our discussion was data as it is shorter than metadata and its content actually is not very "meta". As of writing this, I started to like details a bit more (which only came to my mind afterwards) as I think it's a better description of what it is.
  • implementation:
    1. as we want to check/enforce key and value types, it will be a @property and we're using a setter? And/or a set_data(dict) method?
    2. What is the most reasonable default value? None or {}?

Opinions on any of the points welcome. Otherwise, I'll make a decision and take opinions on the PR.

@scott-huberty
Copy link
Contributor

2. What is the most reasonable default value? None or {}?

We should probably use None becasue {} is mutable, which can cause hard to debug.. bugs!

@drammock
Copy link
Member

as we want to check/enforce key and value types, it will be a @property and we're using a setter? And/or a set_data(dict) method?

@property won't be enough here to enforce key and value restrictions:

class Annotations:
    """Mock up of new `Annotations.data` attribute."""

    def __init__(self):
        self._data = dict()

    @property
    def data(self):
        return self._data


foo = Annotations()
# even without a @setter I can still add new (integer!) keys:
foo.data[3] = lambda: print("value can be a callable!")

print(foo.data)
# {3: <function <lambda> at 0x7fb58260f100>}

Upon init of the Annotations object we can validate whatever is passed in Annotations(..., data=list_of_dicts) but once it exists we can't stop users from adding whatever keys/values they want. So at minimum we will need a set_data() method and to document that users shouldn't directly modify it. But I worry that they'll notice it's a dict and just do it anyway. So that means either:

  1. instead of a dict, it's a subclass of dict that overrides __setitem__ to first do key/value validation and only then set the value, or
  2. make it a dataclass after all.

I just briefly chatted with @larsoner and we're both OK with either approach. I slightly prefer approach 1.

@eioe
Copy link
Contributor Author

eioe commented Nov 17, 2023

Thx for pointing this out. If I was to build an authoritarian regime I would not want to do it using python.

Option 1. sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENH sprint-2023 Issues reserved for the 2023 Intermediate Dev Training
Development

Successfully merging a pull request may close this issue.

7 participants