Skip to content

predict-idlab/data-quality-challenges-wearables

Repository files navigation

🔍 Addressing Data Quality Challenges
in Remote Wearable Monitoring

Codebase & further details for the paper:

Addressing Data Quality Challenges in Observational Ambulatory Studies: Analysis, methodologies and practical solutions for wrist-worn wearable monitoring

In this project, we address data quality challenges encountered in remote wearable monitoring by utilizing two distinct datasets:

  1. ETRI Lifelog 2020: Accessible at ETRI Lifelog 2020
    https://nanum.etri.re.kr/share/schung/ETRILifelogDataset2020?lang=En_us

  2. mBrain21:
    https://www.kaggle.com/datasets/jonvdrdo/mbrain21/data

For each identified challenge, denoted as C<ID>, we have curated a dedicated notebook. These notebooks are specifically designed to demonstrate effective countermeasures against the respective challenges.

📖 Table of contents

📰 How is the repository structured

├── code_utils       <- module containing all shared code
│   ├── empatica     <- Empatica E4 specific code (signal processing pipelines)
│   ├── etri         <- ETRI specific code (data parsing, visualization, dashboard)
│   ├── mbrain       <- mBrain specific code (data parsing, visualization, dashboard)
│   └── utils        <- utility code (dashboard, dataframes, interaction analysis)
├── loc_data         <- local data folder in which intermediate data is stored
└── notebooks        <- Etri and mBrain specific notebooks 
    ├── etri
    └── mBrain

🛠️ Installation

This repository uses poetry as dependency manager. A specification of the dependencies is provided in the pyproject.toml and poetry.lock files.

You can install the dependencies in your Python environment by executing the following steps;

  1. Install poetry: https://python-poetry.org/docs/#installation
  2. Activate you poetry environment by calling poetry shell
  3. Install the dependencies by calling poetry install

🗃️ How to acquire the data

ETRI lifelog 2020

The ETRI lifelog 2020 is made available at https://nanum.etri.re.kr/share/schung/ETRILifelogDataset2020?lang=En_us.

In order to download the dataset, you should first create an account on the ETRI Nanum website. Afterwards, fill in the license agreement form, and upon approval, you will be able to download the dataset via the web platform.

mBrain21

A subset of the mBrain21 dataset is made available on Kaggle datasets: The dataset can be downloaded via the following command:

kaggle datasets download -d jonasvdd/mbrain21

Utilizing this repository

Make sure that you've extended the path_conf.py file's hostname if- statement with your machine's hostname and that you've configured the paths to the mBrain and ETRI datasets.

✨ Challenges & features

Below, a subset of exemplified challenges and features are listed.

📷 Dashboards

This section elaborates on the longitudinal time series visualization dashboards for both the ETRI and mBrain datasets.

Each dashboard contains, as can be observed in the figures below, a left column with selection boxes. The General flow to visualize a specific time series excerpt is as follows:

  • Select a folder (in our case, all data from the ETRI and MBRAIN dataset are stored in the same folder - so you can only select from one option)
  • Select an user (e.g, user30 for the ETRI dataset)

note: After selection a folder and user, the time-span selection will be updated to the available time-span for the selected user-folder combination

  • Select sensors (e.g. 'E4 accelerometer' and 'E4 temperature')

Finally, to visualize, press the run interact button.

ETRI

Once the ETRI dataset has been downloaded and parsed via the ETRI parsing notebook, the corresponding dashboard script can be used to explore & analyse the data. The dashboard can be run via the following command (after activating the poetry shell)

python code_utils/etri/dashboard.py

The output should show the following:

Dash is running on http://0.0.0.0:\<PORT>

In the dashboard screenshot below, both the wearable data and the application event labels are visualized. One can immediately observe that this participant tends to be more alone during evenings (light blue shaded area of the lower row in the upper subplot). During the weekends (indicated with a gray shaded area), this participant tends to be alone and spend a lot of time at home.

mBrain

The dashboard can be run via the following command (after activating the poetry shell)

python code_utils/mBrain/dashboard.py

The output will show the following:

Dash is running on http://0.0.0.0:\<PORT>

Below, we provide a screenshot of the mBrain dashboard. As can be observed from the selection box on the left side, the dashboard shows the headache timeline of the participant, along with the Empatica E4 its accelerometer signal and the smartphone light data. When hovering over a headache event, as shown in the upper plot, one can see the associated characteristics of the headache event.

⌚ off-wrist detection

The wearable non-wear detection is demonstrated in the C5.1_off_wrist_detection notebook.

Moreover, the C7_missing_data notebook demonstrates how this off-wrist pipeline can be used to remove non-wear bouts as a preprocessing step.

Below, a screenshot of the off-wrist pipeline devised by Böttcher et al. (2022) is shown.

✒️ Data annotation

The C5.1_label_off_wrist mBrain notebook demonstrates how large bouts of time-series data can be annotated using plotly-resampler.

Below a demo is shown on how this annotation tool can be used to label off-wrist periods.

📖 Citation

@article{van2024addressing,
  title={Addressing Data Quality Challenges in Observational Ambulatory Studies: Analysis, Methodologies and Practical Solutions for Wrist-worn Wearable Monitoring},
  author={Van Der Donckt, Jonas and Vandenbussche, Nicolas and Van Der Donckt, Jeroen and Chen, Stephanie and Stojchevska, Marija and De Brouwer, Mathias and Steenwinckel, Bram and Paemeleire, Koen and Ongenae, Femke and Van Hoecke, Sofie},
  journal={arXiv preprint arXiv:2401.13518},
  year={2024}
}

📝 License

The code is available under the imec license.


👤 Jonas Van Der Donckt

About

Addressing Data Quality Challenges in Observational Ambulatory Studies: Analysis, methodologies and practical solutions for wrist-worn wearable monitoring

Topics

Resources

License

Stars

Watchers

Forks