Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation suggestion: recommendations for working with integer NIFTI images #1001

Open
pauldmccarthy opened this issue Mar 11, 2021 · 2 comments

Comments

@pauldmccarthy
Copy link
Contributor

Hi @effigies and @matthew-brett (pinging you both directly, because you've had the most involvement in the relevant bits of the nibabel code).

A colleague of mine, very competent, but fairly new to nibabel and numpy, recently found himself in a very confusing situation, ultimately brought about by the scl_slope / scl_inter rescaling logic baked into nibabel, and I'm wondering if another section should be added to the nibabel documentation which gives recommendations on how to work with NIFTI images of integral type.

A full example is given below, but in brief, he was working with a NIFTI1 image of type int32, but working with float64 arrays in memory. He was repeatedly loading an image in, manipulating it, and saving it back out as an int32 image. After a couple of iterations, the floating point imprecision in the scl_slope and scl_inter fields had accumulated enough that the integral values of his image data were no longer being re-constructed, which caused problems when he tried to create and use a binary mask.

The problem boils down to a combination of two coding patterns that are in common use:

  1. Using get_fdata to retrieve the image data, so that it is always loaded and manipulated as float64, regardless of the storage type.

  2. When creating a new image from an existing image (e.g. after some processing step), using the header from the original image to preserve the NIFTI header metadata (not that there's much of use in there, but that's another matter):

    image1 = nib.load('image1.nii.gz')
    data   = image1.get_fdata()
    
    # do stuff to data
    
    # Save new image, preserving header metadata
    # The catch here is that the data is float64,
    # but the header specifies int32 (or some
    # integral type), so the data gets re-scaled
    # to use up the full range of the storage type.
    image2 = nib.Nifti1Image(data, None, image1.header)
    image2.to_filename('image2.nii.gz')

Using the above patterns with images of integer type can lead to problems, as demonstrated in the example below. So what I'm wondering is whether a section should be added to the nibabel documentation, with some recommended strategies for working with integral data. I'd be happy to attempt a first pass if you think it's a good idea.

I can think of a few solutions/patterns to avoiding this problem, any of which could be the best approach depending on the situation:

  • When loading, loading the input data as its storage type
  • When loading, rounding rather than casting, to avoid truncation
  • When writing, converting the data to the desired storage type

To reproduce the situation that my colleague found himself in:

import numpy  as np
import nibabel as nib

# Create some random int32 data
# and save it as a NIFTI image
nib.Nifti1Image(
    np.random.randint(0, 100, (20, 20, 20)).astype(np.int32),
    np.eye(4)).to_filename('image1.nii.gz')

# Now load that image in using
# the recommended get_fdata
image1 = nib.load('image1.nii.gz')
data1  = image1.get_fdata()

# Create a copy of the image, preserving
# the heaader info from the original image.
# Because data1 is float64, and the image
# header specifies int32, the data will be
# rescaled to use the full int32 range.
image2 = nib.Nifti1Image(data1, None, header=image1.header)
image2.to_filename('image2.nii.gz')

# Load that image in, and create a binary
# mask from it.
#
# (Here, because scl_slope and scl_inter
# are set on the image, it doesn't matter
# whether we use get_fdata or dataobj -
# in either case we are given the data as
# float64).
#
# Again, we use the same header, and the
# data is thus rescaled again and saved
# as int32.
image3             = nib.load('image2.nii.gz')
data3              = np.asarray(image3.dataobj)
data3[data3 <= 50] = 0
data3[data3 >  50] = 1
image3             = nib.Nifti1Image(data3, None, header=image1.header)
image3.to_filename('image3.nii.gz')

print(f'number of voxels in binary mask: {data3.sum()}')

# Re-load the mask from file. We're
# using it as a mask, so we'll cast it
# to an integer type
image4 = nib.load('image3.nii.gz')
data4  = np.asarray(image4.dataobj).astype(np.int8)

# But now we have a problem - the re-scaling
# which took place when image3.nii.gz was
# loaded have produced values which are
# slightly less than 1, so the cast to int8
# has truncated them and produced a binary
# mask containing all zeros.
print(f'number of voxels in binary mask: {data4.sum()}')
print(f'                          Slope: {image4.dataobj.slope}')
print(f'                          Inter: {image4.dataobj.inter}')

Example output (note the teensy scaling factor):

number of voxels in binary mask: 3980.0
number of voxels in binary mask: 0
                          Slope: 2.3283064365386963e-10
                          Inter: 0.5
@joshuacwnewton
Copy link
Contributor

joshuacwnewton commented Nov 11, 2022

Hi there! This issue is of great interest to our project (as indicated by the number of pings in the history above).

We work with integer-typed images quite frequently (e.g. integer-valued labels for image landmarks, binary masks, etc.). When our integer data arrays are coerced to float via get_fdata (recommended per the deprecation of get_data), we get into situations where the header will still list int as the datatype, but the array itself is float-typed. This mismatch will then trigger float -> int scaling, which can sometimes result in changing values from, say, 3 to 3.0000457763671875.

To highlight one of the proposed solutions from the issue description:

  1. When loading, loading the input data as its storage type

Right now, to implement this suggestion, we're considering using a pattern akin to:

im.get_fdata().astype(im.header.get_data_dtype())

The idea here being to use this pattern on the very first call to .get_fdata in the example above, to avoid the problematic rescaling.

Though, this assumes that the header metadata is truthworthy. I don't know if this is a safe assumption to make...

@effigies
Copy link
Member

I guess we did not get around to writing documentation, so here are some notes:

  1. If you want the old img.get_data() behavior, just use np.asanyarray(img.dataobj). This will keep the original data type unless scaling is applied. If scaling factors are present in the data, you will get floats.
  2. If you want a predictable type, use np.int16(img.dataobj) (or whatever type you want). You are welcome to put in whatever checks you want or determine your dtype dynamically. You could equivalently use np.asanyarray(img.dataobj, np.int16).
  3. You may wish to do some slicing. It is most efficient on the dataobj. img.dataobj[:, :, :, 0] will give you the first volume. np.uint8(img.dataobj[:, :, :, 0]) will cast it to a uint8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Nilearn Dev Days 2021
Documentation Improvements
Development

No branches or pull requests

3 participants