Annotate MR/PET (+CT) images with deidentification methods #1709

effigies · 2024-02-16T19:02:55Z

Originally posted by @CGSchwarzMayo in #666 (comment):

I came upon [#666] in looking for a .json field that I could population to indicate that images had been de-faced, and with what software/version they had been de-faced. I think the de-facing field has changed a bit since 2020 and I'm hoping the BIDS group might be willing to revisit it.

Some major points of change in the past few years:

De-facing is becoming much more common in research areas it didn't used to be. In particular, most large brain aging and dementia imaging studies (e.g. ADNI, SCAN, A4, ALLFTD, MCSA) are de-facing where they did not before, and these, along with datasets that have been de-facing for longer periods such as HCP and UK-Biobank, make up a substantial portion of publicly available brain imaging datasets.
De-facing is no longer limited to structural MRI. Rather, it is also being applied in PET (FDG, amyloid, tau, and others), CT, and a growing number of brain MRI sequences (T1, T2, FLAIR, T2*/GRE, and even perfusion/ASL). Diffusion and functional/BOLD MRI are the most prominent exceptions that, for now, are not known to be identifiable without de-facing, but most other common brain images are, at this point.
De-faced images are increasingly stored and distributed in DICOM. This is the primary and only distribution method for most of the studies listed above. These DICOM are marked appropriately using DeidentificationMethod and DeidentificationMethodCodeSequence, but when they are converted to .nii, this information is lost because there is no .json field to transfer it to.
A continued proliferation of de-facing software has led to a need to track not just whether images have been de-faced but with what software and with what software version, because data de-faced using different software may not be combinable.

One solution proposed here has been Bids Derivatives. De-facing is actually discussed as the first example on the BIDS-Derivatives page https://bids-specification.readthedocs.io/en/stable/derivatives/introduction.html When de-facing is performed during the curation process, the image would be stored as if it were raw data e.g. sub-01/anat/sub-01_T1w.nii.gz. From what I can tell, this solution makes the image indistinguishable from unmodified raw data. While for most downstream consumers the de-faced data will be the only copy they ever see, so treating it as original primary is appropriate, I don't see that there are standard fields in BIDS-specified .json files that would allow specifying the de-facing software or version used in a standardized and machine-readable format.

Without a standardized .json field, there is no place to store this information that is critical to both users and database maintainers in maintaining and understanding the images. Adding such a field would also allow its preservation when de-faced DICOM are converted to .nii via dcm2niix and similar programs, vs. now the information is lost because no BIDS fields have been standardized.

Would the BIDS maintainers be interested in re-visiting this idea with top-level fields for deface-software and deface-software-version? While I'm personally less invested in skull-stripping, I would envision the same argument for software+version would be appropriate for skull-strip related tags as well.

effigies · 2024-02-16T19:03:40Z

The situation has definitely changed since this was opened. Last year, the following text was adopted:

The following sections cover additions to and divergences from "raw" BIDS. Raw data are data that have been curated into BIDS from a non-BIDS source. If a dataset is derived from at least one other valid BIDS dataset, then it is a derivative dataset.

A dataset that is skull-stripped or defaced prior to or during curation into BIDS is raw, by definition.

3. De-faced images are increasingly stored and distributed in DICOM. This is the primary and only distribution method for most of the studies listed above. These DICOM are marked appropriately using DeidentificationMethod and DeidentificationMethodCodeSequence, but when they are converted to .nii, this information is lost because there is no .json field to transfer it to.

I would be +1 for adding these DICOM fields to BIDS directly.

While this is useful for converting from de-identified DICOM, it would be good to have recommendations for how a curating tool should populate these fields. Something like:

A conversion or curation tool SHOULD append a description of the de-identification
process performed, along with the name and version of the tool.
For example:

```json
  "DeidentificationMethod": ["skull-strip; bids-curator-X (v1.0)"],
```

I don't have a strong opinion on what recommendation is to be made. Doing something similar to what is done in DICOM is probably best, but I haven't seen these.

CGSchwarzMayo · 2024-02-16T19:34:58Z

Thank you for your support! I do want to note again that this isn't just MRI but also PET and CT, while the topic title and tag applied here are both MRI-specific.

Doing something similar to what is done in DICOM is probably best, but I haven't seen these.

I'm far from a DICOM expert, but I tried to read the standards carefully and the way we've implemented in our deface tools (which are being applied to some large and significant open datasets in the wild) looks like this:

(0012, 0062) Patient Identity Removed            CS: 'YES'
(0012, 0063) De-identification Method            LO: 'mri_reface 0.3.3'
(0012, 0064)  De-identification Method Code Sequence   2 item(s) ---- 
   (0008, 0100) Code Value                          SH: '113102'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0103) Coding Scheme Version               SH: '01'
   (0008, 0104) Code Meaning                        LO: 'Clean Recognizable Visual Features Option'
   ---------
   (0008, 0100) Code Value                          SH: 'replace_recognizable'
   (0008, 0102) Coding Scheme Designator            SH: 'mri_reface'
   (0008, 0103) Coding Scheme Version               SH: '0.3.3'
   (0008, 0104) Code Meaning                        LO: 'Replace face, ears, and artifacts in air'

De-identification Method and De-identification Method Code Sequence definitely apply to more than just de-facing, i.e. removal of various DICOM tags is also coded here. A table of the defined codes is in this table, values 113101-113112 https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_d.html. Users can also define their own codes, as we've done in the second sequence-block. We used the second (custom) sequence block because DICOM de-identifier software would often be run after de-facing and could easily overwrite De-identificationMethod with their own information, so using the sequence better ensured that the info would carry through.

Something that's gone through both a de-facer and a DICOM deidentifier could look like this:

(0012, 0062) Patient Identity Removed            CS: 'YES'
(0012, 0063) De-identification Method            LO: 'Per DICOM PS 3.15 AnnexE. Details in 0012,0064, mri_reface 0.3.3'
(0012, 0064)  De-identification Method Code Sequence   5 item(s) ---- 
   (0008, 0100) Code Value                          SH: '113100'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Basic Application Confidentiality Profile'
   ---------
   (0008, 0100) Code Value                          SH: '113107'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Retain Longitudinal Temporal Information Modified Dates Option'
   ---------
   (0008, 0100) Code Value                          SH: '113111'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Retain Safe Private Option'
   ---------
   (0008, 0100) Code Value                          SH: '113102'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0103) Coding Scheme Version               SH: '01'
   (0008, 0104) Code Meaning                        LO: 'Clean Recognizable Visual Features Option'
   ---------
   (0008, 0100) Code Value                          SH: 'replace_recognizable'
   (0008, 0102) Coding Scheme Designator            SH: 'mri_reface'
   (0008, 0103) Coding Scheme Version               SH: '0.3.3'
   (0008, 0104) Code Meaning                        LO: 'Replace face, ears, and artifacts in air'
   ---------

One issue for BIDS is that most of DICOM de-identification is irrelevant to the converted .nii. Things that affect the pixel data are relevant, but removal of PatientName and such are not. The "safe" thing for BIDS would be to serialize all of this information and store it, capturing both what's relevant and what's irrelevant, but that would be very long and difficult to parse. A less-verbose option could be to exclude the code sequences that shouldn't modify anything relevant to nii+json (from this example, ignore the blocks with 113100, 113107, 113111) and keep 113102 + the custom-defined (since there's no way to know if that will be relevant or irrelevant).

A low-effort variant for BIDS could be to just create json fields that directly capture DICOM's Patient Identity Removed (boolean) and De-identification Method (string, non-standardized), ignoring De-identification Method Code Sequence because it's just too complex. This may not capture everything but it would still a big improvement over not having any information captured at all. It could even be considered to skip the boolean PatientIdentityRemoved since basically every BIDS dataset will at least have PatientID re-coded, so it's always TRUE and therefore meaningless. With only DeIdentificationMethod, there's just an unstandardized string so it would be up to users to do whatever and it won't really be machine parse-able. The CodeSequence was designed to solve the latter, but it's not straightforward.

Are there any other areas in BIDS where DICOM CodeSequences were captured? That could offer some guidance. Otherwise it's perfectly fine for BIDS to add a direct capture/translation of "(0012, 0063) De-identification Method" and solve 90% of this need with very small effort.

effigies · 2024-02-16T20:06:40Z

Cool, thanks for sharing that. So an absolutely minimal-effort translation would be:

  "DeidentificationMethod": "Per DICOM PS 3.15 AnnexE. Details in 0012,0064, mri_reface 0.3.3",
  "DeidentificationMethodCodeSequence": [
    {
      "CodeValue": "113100"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Basic Application Confidentiality Profile"
    },
    {
      "CodeValue": "113107"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Retain Longitudinal Temporal Information Modified Dates Option"
    },
    {
      "CodeValue": "113111"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Retain Safe Private Option"
    },
    {
      "CodeValue": "113102"
      "CodingSchemeDesignator": "DCM",
      "CodingSchemeVersion": "01",
      "CodeMeaning": "Clean Recognizable Visual Features Option"
    },
    {
      "CodeValue": "replace_recognizable"
      "CodingSchemeDesignator": "mri_reface",
      "CodingSchemeVersion": "0.3.3",
      "CodeMeaning": "Replace face, ears, and artifacts in air"
    }
  ]

It looks like we could recommend that tools add or append a comma-separated short description to DeidentificationMethod and a structured description to DeidentificationMethodCodeSequence. We could also update these field names to be a little less awkward. For example,

    {
      "Code": "replace_recognizable"
      "Designator": "mri_reface",
      "Version": "0.3.3",
      "Description": "Replace face, ears, and artifacts in air"
    }

Are there any other areas in BIDS where DICOM CodeSequences were captured? That could offer some guidance.

I don't think so.

One issue for BIDS is that most of DICOM de-identification is irrelevant to the converted .nii. Things that affect the pixel data are relevant, but removal of PatientName and such are not. The "safe" thing for BIDS would be to serialize all of this information and store it, capturing both what's relevant and what's irrelevant, but that would be very long and difficult to parse.

IMO it would be reasonable for a DICOM converter to blacklist known "uninteresting" fields, but I think the standard is plenty verbose without dictating those decisions.

I do want to note again that this isn't just MRI but also PET and CT, while the topic title and tag applied here are both MRI-specific.

I added PET, but CT is not yet in BIDS. There is a moribund BEP (https://bids.neuroimaging.io/bep024) that could be revived. It seems short enough that it should not be a heavy lift. It needs a champion that is familiar with CT to wrap up the discussions and bring it into the main spec.

CGSchwarzMayo · 2024-02-16T20:14:15Z

Thanks Chris! I was thinking too rigidly about BIDS fields as single strings rather than using a more complex .json structure to capture sequences. In that case I think I agree most with the "minimal-effort translation" example you proposed. BIDS could easily drop PatientIdentityRemoved while keeping both DeidentificationMethod and DeidentificationMethodCodeSequence in totality. Whether to drop some specific numeric codes would really then be up to Chris Rorden et al rather than BIDS, but my guess is they'd choose to keep them all as the "safe" option.

I have no objection to also shortening some of those fields as in your second example. Does BIDS have any general guidance on keeping field names verbatim from DICOM vs making them shorter/friendlier? That general concept seems like it'd have already come up and been decided at some point.

Are we at the point of making a PR out of this, or should it sit open for discussion for a while first?

effigies · 2024-02-16T20:35:25Z

Are we at the point of making a PR out of this, or should it sit open for discussion for a while first?

We definitely want to give people some time to chime in, but a PR can help make further discussion more concrete. I don't think I'm up to writing a PR yet, but I don't want to stop you. Just be aware that conversations can change direction dramatically, and writing a PR does not necessarily mean acceptance (see previous thread).

effigies · 2024-02-16T20:41:11Z

Does BIDS have any general guidance on keeping field names verbatim from DICOM vs making them shorter/friendlier?

I think in general we try to keep things pretty close when there's a 1-1 correspondence. On the other hand, DICOM seems to have a kind of global namespace, where BIDS is pretty comfortable with reusing fields (like "Description") when they fit in multiple places, especially in nested structures.

So I would probably encourage keeping the two top-level as direct DICOM. I could go either way for the others.

That general concept seems like it'd have already come up and been decided at some point.

You'd think so. If it's been written down, I can't readily find it. @yarikoptic might be the most likely to know for sure.

CGSchwarzMayo · 2024-03-01T15:07:49Z

Another mild point of support: I've learned that at least some Siemens MRI scanners have built-in "anonymization" options that strip some tags, and these also fill DeidentificationMethod and DeidentificationMethodCodeSequence. These DICOM tags are being used out in the wild beyond just de-facing, and it would be really great if the BIDS json files could capture that information.

effigies · 2024-03-08T03:21:45Z

I suppose it's been enough time for people to raise objections. Would you be up to drafting some text or writing a PR?

CGSchwarzMayo · 2024-03-08T09:00:57Z

Yes, but I am away for the next couple of weeks. I can start to work on it after I return. Thanks for your continued interest and support!

CGSchwarzMayo · 2024-04-12T14:54:02Z

I'm having a hard time figuring out how to officially link them, but I created PR #1772 for this.

effigies added enhancement New feature or request MRI For things that affect all MRI datatypes labels Feb 16, 2024

effigies changed the title ~~Annotate MR images with deidentification methods~~ Annotate MR/PET (+CT) images with deidentification methods Feb 16, 2024

effigies added the PET label Feb 16, 2024

CGSchwarzMayo mentioned this issue Apr 12, 2024

[ENH] Add metadata fields for DeIdentificationMethod/CodeSequence for MRI and PET #1772

Open

This was referenced Apr 17, 2024

Add DeidentificationMethod and DeidentificationMethodCodeSequence to json rordenlab/dcm2niix#812

Closed

Add DeidentificationMethod and DeidentificationMethodCodeSequence to json rordenlab/dcm2niix#813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate MR/PET (+CT) images with deidentification methods #1709

Annotate MR/PET (+CT) images with deidentification methods #1709

effigies commented Feb 16, 2024 •

edited

effigies commented Feb 16, 2024

CGSchwarzMayo commented Feb 16, 2024

effigies commented Feb 16, 2024

CGSchwarzMayo commented Feb 16, 2024

effigies commented Feb 16, 2024 •

edited

effigies commented Feb 16, 2024

CGSchwarzMayo commented Mar 1, 2024

effigies commented Mar 8, 2024

CGSchwarzMayo commented Mar 8, 2024

CGSchwarzMayo commented Apr 12, 2024 •

edited

Annotate MR/PET (+CT) images with deidentification methods #1709

Annotate MR/PET (+CT) images with deidentification methods #1709

Comments

effigies commented Feb 16, 2024 • edited

effigies commented Feb 16, 2024

CGSchwarzMayo commented Feb 16, 2024

effigies commented Feb 16, 2024

CGSchwarzMayo commented Feb 16, 2024

effigies commented Feb 16, 2024 • edited

effigies commented Feb 16, 2024

CGSchwarzMayo commented Mar 1, 2024

effigies commented Mar 8, 2024

CGSchwarzMayo commented Mar 8, 2024

CGSchwarzMayo commented Apr 12, 2024 • edited

effigies commented Feb 16, 2024 •

edited

effigies commented Feb 16, 2024 •

edited

CGSchwarzMayo commented Apr 12, 2024 •

edited