-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add image file reader class #1447
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1447 +/- ##
==========================================
- Coverage 97.42% 96.45% -0.98%
==========================================
Files 66 66
Lines 10182 10425 +243
==========================================
+ Hits 9920 10055 +135
- Misses 262 370 +108
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts from a first pass...
Re my comment on metadata
, I wonder if there is a way to integrate this into Dataset
- just off the cuff here, could we have something like a frames
property as an alternate to pixel_array
? Generally I don't like to complicate Dataset
with new code, but this is quite a common need, and important for large file sizes.
Examples | ||
-------- | ||
>>> from highdicom.io import ImageFileReader | ||
>>> with ImageFileReader('/path/to/file.dcm') as image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to highdicom needs replacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed via 8b58178
""" | ||
|
||
def __init__(self, filename: str): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also include a file object rather than just a filename? Inevitably someone needing to work with streams will ask for it, although some may be limited if there are negative seek
s, as seen for example in #753.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally the idea of allowing the user to alternatively provide an existing file object and have refactored the class accordingly via e1056bd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have decided to not set the is_little_endian
and is_implicit_VR
attributes on the file object, so that the object does not get modified. Instead an error is raised if their values are not correct.
This is one of the reasons I initially implemented the class such that the file object gets created under the hood and the attributes are set correctly.
@darcymason What are your thoughts on this?
|
||
|
||
class ImageFileReader(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclassing from object
only needed for Python 2, which we have dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed via 0b5b68c
>>> from highdicom.io import ImageFileReader | ||
>>> with ImageFileReader('/path/to/file.dcm') as image: | ||
... print(image.metadata) | ||
... for i in range(image.number_of_frames): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the term metadata
here, especially because it sounds too much like the File Meta Information defined for part 10 files in the Standard. But also because it is really a Dataset (from dcmread
on open
call) of all data elements except the pixel data; pydicom (and DICOM itself) don't give unique status to pixel data, unlike other file types with a 'header' or 'metadata'; in DICOM they are all just data elements.
Having said that, I'm not sure what to call it. In pydicom, using stop_before_pixels=True
just gives a Dataset, simply with the pixel data element missing. I'll mull this over a little more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have heard that term too, and have also been confused by it. I think it comes from image formats like TIFF who have embedded metadata tags.
What I encounter more often is "DICOM header data" - not sure if that would be a better term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DICOM header data
I think 'header' applies to other formats where the pixel data is not tagged, but whatever is left after some initial data is just raw pixel bytes. It assumes every file is basically pixel data, plus a few extra things.
In DICOM, many file types (SOP Classes) do not have pixel data, so the whole file would be 'header', which doesn't really make sense. Same goes for metadata
in the sense of everything that is not pixels.
could we have something like a
frames
property as an alternate topixel_array
?
Any thoughts on the above general comment? If this could work as an add-on to Dataset
, then the nomenclature issue could go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DICOM, many file types (SOP Classes) do not have pixel data, so the whole file would be 'header', which doesn't really make sense.
Fair enough. I didn't really like the term too much, was just thinking of an alternative to metadata.
Any thoughts on the above general comment?
I think it makes sense. Frames are an inherent property of DICOM files, so it would fit IMHO. Also, I think that the ability to read single frames is an important feature that belongs into the core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding terminology, I think that anyone who uses DICOMweb will be familiar with the term "metadata" and the value returned by the metadata
property will (more or less) correspond to what one would get via a DICOMweb /metadata
call. Similarly, the value returned by read_frame()
corresponds to what one would get via a DICOMweb /frames/{frameNumber}
call.
I initially also thought about adding this functionality to the
|
@darcymason not sure whether you noticed the color correction feature (see here), which depends on #1446. This feature is potentially a bit tricky, because it requires Pillow. We could also leave that out for now. |
This had crossed my mind, but I was thinking (without saying it 'out loud' of course) that we might want to move in this direction anyway with memory mapping, as noted in previous discussions like #1267. I think it might be interesting to explore whether memmap'ng can be used on all files (or at least ones above a modest size) transparently, without impacting performance of reading smaller files. I think everyone dealing with large files would like that behavior, and if was transparent in other cases, no one would complain. I've also was re-reading the discussion about combining
I'm not clear on how the separate class is different in this respect.
I generally agree very much with avoiding image-specific methods, but we already have I'll keep mulling all this though, happy to hear other thoughts... |
I would consider implementing a |
Personally, I would distinguish between the representation of a DICOM Data Set in memory and a DICOM Part 10 file stored on disk. We should not assume that every However, independent of whether we'll further pursue the memory mapping, the |
Good point.
That seems reasonable - we could start here and perhaps incorporate into Dataset at some point in future, but with the logic kept in the separate class. |
@darcymason I removed the color management code from the We will need Pillow for color management (at least in its current implementation). However, Pillow is an optional dependency and therefore requires (not so elegant) workarounds to avoid Pillow symbols from being exposed globally. I would suggest working on getting the |
@darcymason are there any outstanding TODOs preventing this PR from being merged and released? |
Not from me, but I was hoping @scaramallion could review and comment on how/whether this fits in with the various handlers concepts. |
@scaramallion what are your thoughts? |
@scaramallion I was wondering whether you had a chance to take a look. Please let me know if you have any questions or comments. It would be great if we could get clarity on whether we can include this functionality sometime soon. |
@darcymason @scaramallion this is currently blocking a feature I would like to add to the dicomweb-client library. The library already depends on pydicom and I don't want to introduce a new dependency on highdicom just for the low-level file I/O functionality of |
I'll have another look at this today - my only concern is to have some thought about the principles of how this is worked in, to do our best not to have to rework the 'API' later to fit a more general idea of specific handlers. |
First, apologies that this PR has dragged on for a long time - it is harder when it comes to potentially 'structural'/API-like issues. I keep thinking that I'll go over it all again and try to think through the full implications/alternatives. And, sometimes hope that a kind of eureka moment will just happen. Having said that, I had a good look again through this PR, and existing pydicom code, and had a good think about it. Then I read through the discussion above again and see that I came up with basically the same concerns as I had before. Before I had kind of accepted the idea we could adapt downstream, but on further thought it is usually much easier to avoid adding something which is deprecated and removed later. I do think we need some more discussion of a The question of how/when items are loaded into memory is still there. On that I came back to my previous idea of a config passed to If you like, I could put together an alternate branch (try to commit to doing so this week) which adapts the code to the And I'm still interested in @scaramallion's opinion - this seems very related to the existing pydicom |
Thanks for your review and feedback @darcymason!
Personally, I would not add more methods to the However, reading individual frames efficiently will require partial reading of the data set and building (and caching!) a Basic Offset Table. This is all functionality that should not be something the
I don't consider the |
Actually, reading from a file is basically all pydicom ever claimed to do... it's the first line in our README:
And I should say that when I say I do generally like the idea of decoupling IO from data. It's a nice principle, but sometimes practicality can trump principle. We did historically offer a 'deferred read' concept, which was not well-tested and was removed a while ago because of that, but could be brought back. And IMO the proposed class here doesn't really do it either, except for chunking off the pre-read data into I'm busy through the weekend, but I'll try to think this over some more and make a decision next week. |
Thanks @darcymason. Much appreciated!
I strongly disagree with that notion and tend argue that it goes fundamentally against the DICOM standard, which is after all primarily concerned with communication of data over network. If the library has the intention of limiting The API of the library so far has also nicely separated the representation of the Data Set ( cc @dclunie
Good point! The |
@darcymason @scaramallion is this something you still think would be useful to add to pydicom? |
In my opinion, this would be very useful. I just stumbled over this PR because of a related SO question and I'm not sure in what state it is, but I had to access single frames of large multi-frame images in the past and find it an important feature (we had helped dcmtk to add a similar feature at the time). |
Describe the changes
Adds support for reading efficiently individual frames of a (multi-frame) image without loading the entire Pixel Data element into memory (see also #534, #1263, #1243).
Tasks
doc/_build/html/index.html
)