-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add framereader module #1725
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1725 +/- ##
==========================================
+ Coverage 97.59% 97.68% +0.09%
==========================================
Files 66 67 +1
Lines 10769 11125 +356
==========================================
+ Hits 10510 10868 +358
+ Misses 259 257 -2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Not sure why mypy is complaining about files that are not changed by this MR - I haven't been able to reproduce locally. |
fragment_offsets.append(frame_position - first_frame_location) | ||
first_two_bytes = fp.read(2, True) | ||
if not fp.is_little_endian: | ||
first_two_bytes = first_two_bytes[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove this since it appears that encapsulated should always be little endian, but leaving in case there's some other reason @hackermd included the if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is probably not needed, but maybe @hackermd can comment on this. Or generally review this PR, for that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had only a cursory glance so far, will have a closer look probably in a couple of days. Generally, I think this is a very valuable addition, and it is good to continue the work @hackermd had started here. If he did not already do this, I suggest that he also reviews this.
This also needs an entry in the release notes, and ideally a description in the documentation related to reading pixel data.
fragment_offsets.append(frame_position - first_frame_location) | ||
first_two_bytes = fp.read(2, True) | ||
if not fp.is_little_endian: | ||
first_two_bytes = first_two_bytes[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is probably not needed, but maybe @hackermd can comment on this. Or generally review this PR, for that...
Thanks for all your great work @kalebdfischer! I will take a closer look in the next couple of days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together. I've done a quick pass through, will try to look again in more detail later. Just some minor comments at this point.
@@ -0,0 +1,1132 @@ | |||
# Copyright 2008-2022 pydicom authors. See LICENSE file for details. | |||
"""Utilities for parsing DICOM PixelData frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: copyright here can just use 2022, helps to flag the file is newer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
} | ||
_UINT_PIXEL_DATA_TAGS = { | ||
0x7FE00010, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps should be INT without the U since in general pixels can be signed or unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
pydicom/framereader.py
Outdated
""" | ||
if fp.tell() != pixel_data_location: | ||
fp.seek(pixel_data_location, 0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the doc string about IOError correct? Seems it ignores and just moves the pointer itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - missed the discrepancy in the refactor. Updated to validate and raise here
pydicom/framereader.py
Outdated
ds_copy.file_meta = FileMetaDataset() | ||
ds_copy.file_meta.TransferSyntaxUID = getattr( | ||
original_dataset.file_meta, "TransferSyntaxUID" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a getattr
default, this is same as original_dataset.file_meta.TransferSyntaxUID
. Did you mean to default to None
or something else? In which case can use original_dataset.file_meta.get("TransferSyntaxUID")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was lazy code that counted on the AttributeError exceptions. I've updated the function to raise ValueError for missing required attributes.
""" | ||
logger.debug("read File Meta Information") | ||
self.fp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bare statement seems fragile. I assume it triggers the fp
property to open the file, but would not on a second call if the self._fp
had not been reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.fp
will raise if the file doesn't exist or another exception is raised when trying to open a BufferedReader which will trigger the exit logic. It's a fair consideration that there may be an issue that goes undetected if a bad file-like is provided instead of a path.
I've done a couple of things to improve behavior and readability:
- added a try/except (with re-raise) to open with an extra logging statement to improve readability
- called self.dicom_file_like instead since it actually requires valid frame_info and DicomFileLike instantiation
Let me know if there are other improvements that you'd like made to this.
for f in get_testdata_files("*.dcm") | ||
if not any([f.endswith(x) for x in cannot_read]) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier and more readable, I think, to make can_read
a set
, and then all_paths=(Path(f) for f in get_testdata_files("*.dcm"))
, and can_read_paths = [p for p in all_paths if p.name not in cannot_read]
.
Really pydicom should have a get_testdata_paths
function since we've been trying to move more towards pathlib; something for a future PR if we can remember...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here
I added updates to the release notes, working_with_pixel_data, and added a framereader.rst file to reference. Happy to wait for @hackermd's review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to review this PR and added comments. However, I ultimately concluded that we should have additional design discussions.
The PR adds several classes (FrameDataset
, FrameInfo
, etc.) and methods (to_dict()
) that do not directly map to DICOM information entities and are therefore problematic in my opinion. The rest of the library closely follows and directly implements the DICOM information model (e.g., DataElement
, Dataset
, etc.). The changes introduce via this PR divert from that approach.
Furthermore, I think the functionality mainly belongs into the filereader
module. The main class is intended for reading frames from files (or file-like objects) and uses file-specific methods (read()
, seek()
, tell()
, etc.). It is not intended to decode frames from a Dataset
that has already been fully read into memory. In other words, it is very different from the Dataset.pixel_array
property and similar to dcmread()
. What differentiates it from dcmread()
is that it provides access to individual frames and that it caches information to be reused (metadata and BOT) for repeated frame-level access. For improved cohension, I would thus add all the functionality related to file reading into the filereader
module. Functionality related to frame decoding can remain in frame
.
pydicom/framereader.py
Outdated
return basic_offset_table | ||
|
||
|
||
def get_dataset_copy_with_frame_attrs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not expose that function publicly. To me, this should be an implementation detail. I even tend argue that the individual elements should be passed directly to decoders and missing elements should be handled explicitly.
pydicom/framereader.py
Outdated
|
||
|
||
def decode_frame( | ||
frame_bytes: bytes, original_dataset: Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest passing the individual data elements to the function rather than original_dataset
. That is more verbose, but easier to understand.
Specifically, I suggest using the existing implementation in highdicom: https://github.com/herrmannlab/highdicom/blob/d36e2a50350fd15531f549f52dafd87587a80afe/src/highdicom/frame.py#L327-L451
If that gets implemented here in pydicom, we can just import it in highdicom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to refactor with the function from highdicom, but came across an issue - this line has mismatched typing. I'm not terribly comfortable just telling mypy to ignore this. Do you have any suggestions?
pydicom/framereader.py
Outdated
return ds_temp.pixel_array | ||
|
||
|
||
class BasicOffsetTable(List): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make that a subclass of List
and then add to_dict()
and from_dict()
methods. If you really want to implement it as a class, I suggest using a NamedTuple
instead.
@hackermd thanks for the original MR! We've been maintaining a pydicom branch with the ImageFileReader for For those using cloud storage for files, download of files is one of the greatest sources of latency. This refactor is going to enable thriftier frame retrieval the frame information can be stored and one can specifically request the frame bytes from storage on subsequent reads without need to retrieve the whole file. |
Have you considered using |
@kalebdfischer @darcymason please note that the I think it would be great to have that class implemented in pydicom so that we can just import it in highdicom. However, that would require the API to remain the same. |
Thanks for the recommendation - our DIMSE service actually uses dicomweb_client. There are a number of reasons that pydicom is more ergonomic and practical for our OHIF backend service. |
I'm happy to help with any refactoring needed by highdicom. I will take a deeper look this afternoon to ensure that this PR doesn't make the refactoring more of a pain than it needs to be. |
It seems reasonable to me to make this as compatible as possible. If it's going to be here anyway, you might as well not have duplicate functionality in highdicom. |
Also a good point. Pydicom has always tried to use DICOM nomenclature wherever possible. |
return frame_dataset | ||
|
||
|
||
class FrameInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, be aware that there are datasets with hundreds of thousands of frames and we don't want this class to become a performance bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any way that this class imposes a performance bottleneck that ImageFileReader does not.
Yes, if the API and behavior remains the same, |
Thank you for the in-depth review. I'll be working on revisions today.
Could you elaborate on what the issue is and what changes you'd like? It doesn't seem terribly reasonable to remove tooling that allows storage to and from a json-like object, but I'm happy to make improvements if there are specific suggestions.
I'll defer to the maintainers on this one, but the benefits of a separate module are:
|
The library implements the DICOM standard and uses the terms defined by the DICOM Model of the Real World and DICOM Information Model. For example, "Frame Info" is not a defined Information Entity (IE), Information Object Definition (IOD), Module, Macro, Attribute, etc. |
The problem with the term |
@kalebdfischer I suggest implementing the API of the highdicom.io.ImageFileReader class. I am all for improvement of implementation details, but our goal is to avoid duplication of functionality between pydicom and highdicom. Once the class is implemented in pydicom, we can deprecate it in highdicom. It would be great if changes were based on #1447 so that we can more easily track changes. |
@kalebdfischer, @hackermd, @mrbean-bremen. and perhaps @pieper @fedorov @CPBridge @scaramallion I'm reviewing issues again for consideration in release 2.4. This discussion has been idle for a while, and I'm thinking this is too complex to resolve in short order, and I'd rather get the release out in the next few weeks. So I suggest re-assigning this to the following (major) release, 3.0. Perhaps also as part of the Roadmap development it will become clearer how pydicom should incorporate this kind of functionality. Any strong thoughts on this? |
Describe the changes
Adds support for reading efficiently individual frames of a (multi-frame) image without loading the entire Pixel Data element into memory (see also #534, #1263, #1243).
This work is based on @hackermd 's ImageFileReader MR, with the following noteworthy changes:
OBXXXX1A_rle_2frame.dcm
andrtdose_rle.dcm
)SC_rgb_rle.dcm
which claim to be explicit, but are actually implicit.if encapsulated
metadata
defer_size
,force
,specific_tags
Tasks
doc/_build/html/index.html
)