Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate MR/PET (+CT) images with deidentification methods #1709

Open
effigies opened this issue Feb 16, 2024 · 10 comments
Open

Annotate MR/PET (+CT) images with deidentification methods #1709

effigies opened this issue Feb 16, 2024 · 10 comments
Labels
enhancement New feature or request MRI For things that affect all MRI datatypes PET

Comments

@effigies
Copy link
Collaborator

effigies commented Feb 16, 2024

Originally posted by @CGSchwarzMayo in #666 (comment):


I came upon [#666] in looking for a .json field that I could population to indicate that images had been de-faced, and with what software/version they had been de-faced. I think the de-facing field has changed a bit since 2020 and I'm hoping the BIDS group might be willing to revisit it.

Some major points of change in the past few years:

  1. De-facing is becoming much more common in research areas it didn't used to be. In particular, most large brain aging and dementia imaging studies (e.g. ADNI, SCAN, A4, ALLFTD, MCSA) are de-facing where they did not before, and these, along with datasets that have been de-facing for longer periods such as HCP and UK-Biobank, make up a substantial portion of publicly available brain imaging datasets.
  2. De-facing is no longer limited to structural MRI. Rather, it is also being applied in PET (FDG, amyloid, tau, and others), CT, and a growing number of brain MRI sequences (T1, T2, FLAIR, T2*/GRE, and even perfusion/ASL). Diffusion and functional/BOLD MRI are the most prominent exceptions that, for now, are not known to be identifiable without de-facing, but most other common brain images are, at this point.
  3. De-faced images are increasingly stored and distributed in DICOM. This is the primary and only distribution method for most of the studies listed above. These DICOM are marked appropriately using DeidentificationMethod and DeidentificationMethodCodeSequence, but when they are converted to .nii, this information is lost because there is no .json field to transfer it to.
  4. A continued proliferation of de-facing software has led to a need to track not just whether images have been de-faced but with what software and with what software version, because data de-faced using different software may not be combinable.

One solution proposed here has been Bids Derivatives. De-facing is actually discussed as the first example on the BIDS-Derivatives page https://bids-specification.readthedocs.io/en/stable/derivatives/introduction.html When de-facing is performed during the curation process, the image would be stored as if it were raw data e.g. sub-01/anat/sub-01_T1w.nii.gz. From what I can tell, this solution makes the image indistinguishable from unmodified raw data. While for most downstream consumers the de-faced data will be the only copy they ever see, so treating it as original primary is appropriate, I don't see that there are standard fields in BIDS-specified .json files that would allow specifying the de-facing software or version used in a standardized and machine-readable format.

Without a standardized .json field, there is no place to store this information that is critical to both users and database maintainers in maintaining and understanding the images. Adding such a field would also allow its preservation when de-faced DICOM are converted to .nii via dcm2niix and similar programs, vs. now the information is lost because no BIDS fields have been standardized.

Would the BIDS maintainers be interested in re-visiting this idea with top-level fields for deface-software and deface-software-version? While I'm personally less invested in skull-stripping, I would envision the same argument for software+version would be appropriate for skull-strip related tags as well.

@effigies
Copy link
Collaborator Author

The situation has definitely changed since this was opened. Last year, the following text was adopted:

The following sections cover additions to and divergences from "raw" BIDS. Raw data are data that have been curated into BIDS from a non-BIDS source. If a dataset is derived from at least one other valid BIDS dataset, then it is a derivative dataset.

A dataset that is skull-stripped or defaced prior to or during curation into BIDS is raw, by definition.

3. De-faced images are increasingly stored and distributed in DICOM. This is the primary and only distribution method for most of the studies listed above. These DICOM are marked appropriately using DeidentificationMethod and DeidentificationMethodCodeSequence, but when they are converted to .nii, this information is lost because there is no .json field to transfer it to.

I would be +1 for adding these DICOM fields to BIDS directly.

While this is useful for converting from de-identified DICOM, it would be good to have recommendations for how a curating tool should populate these fields. Something like:

A conversion or curation tool SHOULD append a description of the de-identification
process performed, along with the name and version of the tool.
For example:

```json
  "DeidentificationMethod": ["skull-strip; bids-curator-X (v1.0)"],
```

I don't have a strong opinion on what recommendation is to be made. Doing something similar to what is done in DICOM is probably best, but I haven't seen these.

@effigies effigies added enhancement New feature or request MRI For things that affect all MRI datatypes labels Feb 16, 2024
@CGSchwarzMayo
Copy link

Thank you for your support! I do want to note again that this isn't just MRI but also PET and CT, while the topic title and tag applied here are both MRI-specific.

Doing something similar to what is done in DICOM is probably best, but I haven't seen these.

I'm far from a DICOM expert, but I tried to read the standards carefully and the way we've implemented in our deface tools (which are being applied to some large and significant open datasets in the wild) looks like this:

(0012, 0062) Patient Identity Removed            CS: 'YES'
(0012, 0063) De-identification Method            LO: 'mri_reface 0.3.3'
(0012, 0064)  De-identification Method Code Sequence   2 item(s) ---- 
   (0008, 0100) Code Value                          SH: '113102'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0103) Coding Scheme Version               SH: '01'
   (0008, 0104) Code Meaning                        LO: 'Clean Recognizable Visual Features Option'
   ---------
   (0008, 0100) Code Value                          SH: 'replace_recognizable'
   (0008, 0102) Coding Scheme Designator            SH: 'mri_reface'
   (0008, 0103) Coding Scheme Version               SH: '0.3.3'
   (0008, 0104) Code Meaning                        LO: 'Replace face, ears, and artifacts in air'

De-identification Method and De-identification Method Code Sequence definitely apply to more than just de-facing, i.e. removal of various DICOM tags is also coded here. A table of the defined codes is in this table, values 113101-113112 https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_d.html. Users can also define their own codes, as we've done in the second sequence-block. We used the second (custom) sequence block because DICOM de-identifier software would often be run after de-facing and could easily overwrite De-identificationMethod with their own information, so using the sequence better ensured that the info would carry through.

Something that's gone through both a de-facer and a DICOM deidentifier could look like this:

(0012, 0062) Patient Identity Removed            CS: 'YES'
(0012, 0063) De-identification Method            LO: 'Per DICOM PS 3.15 AnnexE. Details in 0012,0064, mri_reface 0.3.3'
(0012, 0064)  De-identification Method Code Sequence   5 item(s) ---- 
   (0008, 0100) Code Value                          SH: '113100'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Basic Application Confidentiality Profile'
   ---------
   (0008, 0100) Code Value                          SH: '113107'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Retain Longitudinal Temporal Information Modified Dates Option'
   ---------
   (0008, 0100) Code Value                          SH: '113111'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0104) Code Meaning                        LO: 'Retain Safe Private Option'
   ---------
   (0008, 0100) Code Value                          SH: '113102'
   (0008, 0102) Coding Scheme Designator            SH: 'DCM'
   (0008, 0103) Coding Scheme Version               SH: '01'
   (0008, 0104) Code Meaning                        LO: 'Clean Recognizable Visual Features Option'
   ---------
   (0008, 0100) Code Value                          SH: 'replace_recognizable'
   (0008, 0102) Coding Scheme Designator            SH: 'mri_reface'
   (0008, 0103) Coding Scheme Version               SH: '0.3.3'
   (0008, 0104) Code Meaning                        LO: 'Replace face, ears, and artifacts in air'
   ---------

One issue for BIDS is that most of DICOM de-identification is irrelevant to the converted .nii. Things that affect the pixel data are relevant, but removal of PatientName and such are not. The "safe" thing for BIDS would be to serialize all of this information and store it, capturing both what's relevant and what's irrelevant, but that would be very long and difficult to parse. A less-verbose option could be to exclude the code sequences that shouldn't modify anything relevant to nii+json (from this example, ignore the blocks with 113100, 113107, 113111) and keep 113102 + the custom-defined (since there's no way to know if that will be relevant or irrelevant).

A low-effort variant for BIDS could be to just create json fields that directly capture DICOM's Patient Identity Removed (boolean) and De-identification Method (string, non-standardized), ignoring De-identification Method Code Sequence because it's just too complex. This may not capture everything but it would still a big improvement over not having any information captured at all. It could even be considered to skip the boolean PatientIdentityRemoved since basically every BIDS dataset will at least have PatientID re-coded, so it's always TRUE and therefore meaningless. With only DeIdentificationMethod, there's just an unstandardized string so it would be up to users to do whatever and it won't really be machine parse-able. The CodeSequence was designed to solve the latter, but it's not straightforward.

Are there any other areas in BIDS where DICOM CodeSequences were captured? That could offer some guidance. Otherwise it's perfectly fine for BIDS to add a direct capture/translation of "(0012, 0063) De-identification Method" and solve 90% of this need with very small effort.

@effigies effigies changed the title Annotate MR images with deidentification methods Annotate MR/PET (+CT) images with deidentification methods Feb 16, 2024
@effigies effigies added the PET label Feb 16, 2024
@effigies
Copy link
Collaborator Author

Cool, thanks for sharing that. So an absolutely minimal-effort translation would be:

  "DeidentificationMethod": "Per DICOM PS 3.15 AnnexE. Details in 0012,0064, mri_reface 0.3.3",
  "DeidentificationMethodCodeSequence": [
    {
      "CodeValue": "113100"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Basic Application Confidentiality Profile"
    },
    {
      "CodeValue": "113107"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Retain Longitudinal Temporal Information Modified Dates Option"
    },
    {
      "CodeValue": "113111"
      "CodingSchemeDesignator": "DCM",
      "CodeMeaning": "Retain Safe Private Option"
    },
    {
      "CodeValue": "113102"
      "CodingSchemeDesignator": "DCM",
      "CodingSchemeVersion": "01",
      "CodeMeaning": "Clean Recognizable Visual Features Option"
    },
    {
      "CodeValue": "replace_recognizable"
      "CodingSchemeDesignator": "mri_reface",
      "CodingSchemeVersion": "0.3.3",
      "CodeMeaning": "Replace face, ears, and artifacts in air"
    }
  ]

It looks like we could recommend that tools add or append a comma-separated short description to DeidentificationMethod and a structured description to DeidentificationMethodCodeSequence. We could also update these field names to be a little less awkward. For example,

    {
      "Code": "replace_recognizable"
      "Designator": "mri_reface",
      "Version": "0.3.3",
      "Description": "Replace face, ears, and artifacts in air"
    }

Are there any other areas in BIDS where DICOM CodeSequences were captured? That could offer some guidance.

I don't think so.

One issue for BIDS is that most of DICOM de-identification is irrelevant to the converted .nii. Things that affect the pixel data are relevant, but removal of PatientName and such are not. The "safe" thing for BIDS would be to serialize all of this information and store it, capturing both what's relevant and what's irrelevant, but that would be very long and difficult to parse.

IMO it would be reasonable for a DICOM converter to blacklist known "uninteresting" fields, but I think the standard is plenty verbose without dictating those decisions.

I do want to note again that this isn't just MRI but also PET and CT, while the topic title and tag applied here are both MRI-specific.

I added PET, but CT is not yet in BIDS. There is a moribund BEP (https://bids.neuroimaging.io/bep024) that could be revived. It seems short enough that it should not be a heavy lift. It needs a champion that is familiar with CT to wrap up the discussions and bring it into the main spec.

@CGSchwarzMayo
Copy link

Thanks Chris! I was thinking too rigidly about BIDS fields as single strings rather than using a more complex .json structure to capture sequences. In that case I think I agree most with the "minimal-effort translation" example you proposed. BIDS could easily drop PatientIdentityRemoved while keeping both DeidentificationMethod and DeidentificationMethodCodeSequence in totality. Whether to drop some specific numeric codes would really then be up to Chris Rorden et al rather than BIDS, but my guess is they'd choose to keep them all as the "safe" option.

I have no objection to also shortening some of those fields as in your second example. Does BIDS have any general guidance on keeping field names verbatim from DICOM vs making them shorter/friendlier? That general concept seems like it'd have already come up and been decided at some point.

Are we at the point of making a PR out of this, or should it sit open for discussion for a while first?

@effigies
Copy link
Collaborator Author

effigies commented Feb 16, 2024

Are we at the point of making a PR out of this, or should it sit open for discussion for a while first?

We definitely want to give people some time to chime in, but a PR can help make further discussion more concrete. I don't think I'm up to writing a PR yet, but I don't want to stop you. Just be aware that conversations can change direction dramatically, and writing a PR does not necessarily mean acceptance (see previous thread).

@effigies
Copy link
Collaborator Author

Does BIDS have any general guidance on keeping field names verbatim from DICOM vs making them shorter/friendlier?

I think in general we try to keep things pretty close when there's a 1-1 correspondence. On the other hand, DICOM seems to have a kind of global namespace, where BIDS is pretty comfortable with reusing fields (like "Description") when they fit in multiple places, especially in nested structures.

So I would probably encourage keeping the two top-level as direct DICOM. I could go either way for the others.

That general concept seems like it'd have already come up and been decided at some point.

You'd think so. If it's been written down, I can't readily find it. @yarikoptic might be the most likely to know for sure.

@CGSchwarzMayo
Copy link

Another mild point of support: I've learned that at least some Siemens MRI scanners have built-in "anonymization" options that strip some tags, and these also fill DeidentificationMethod and DeidentificationMethodCodeSequence. These DICOM tags are being used out in the wild beyond just de-facing, and it would be really great if the BIDS json files could capture that information.

@effigies
Copy link
Collaborator Author

effigies commented Mar 8, 2024

I suppose it's been enough time for people to raise objections. Would you be up to drafting some text or writing a PR?

@CGSchwarzMayo
Copy link

Yes, but I am away for the next couple of weeks. I can start to work on it after I return. Thanks for your continued interest and support!

@CGSchwarzMayo
Copy link

CGSchwarzMayo commented Apr 12, 2024

I'm having a hard time figuring out how to officially link them, but I created PR #1772 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request MRI For things that affect all MRI datatypes PET
Projects
None yet
Development

No branches or pull requests

2 participants