-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VR type check when element values are being set #1414
Comments
This is at leastly partly implemented by #1543. @scaramallion - I'm not sure what remains to do here, can you have a look? |
I think originally I created the issue because a user was confused between supplying a raw bytes value for one of the longer O* VRs and supplying the decoded value. It's also something I sometimes do, particularly with SH and integer VRs like US. I think it'd still be a nice enhancement to cover all VRs when setting new values. |
As far as I understand, you mean validation for non-string VRs, which is currently not done - this would make sense. I would say that for all numeric VRs (US, UL, UV, SS, SL, SV, FL, FD) only numbers in the correct range shall be allowed, for O* and AT probably arrays of these numbers or byte arrays (at least for OB, OW, OL, OV and maybe AT), but never strings.
Is this about assigning (decoded) string values instead of the raw byte values?
Not sure about SH - this is a string VR and should already be validated. Can you elaborate on this? |
You're right about the SH/string VRs, that should get caught now. |
Ok, so you are basically saying that all O* values should always get raw bytes values, because they have a VM of 1, while the numerical VRs could get arrays of numbers (for VM > 1). One could argue that it would still be convenient to set a float array for an OF value, but the best way is probably to use a numpy float array and convert it to a byte array, so I guess your approach makes sense. I will see if I can put something together, maybe over the weekend. |
I'm not as deep in this as you two are, but I'd like to raise the principle that all native Python types that make sense should be accepted, as well as raw bytes. My recollection is that raw bytes on setting are accepted almost as a side-effect because that is what is done when reading from file, and the same machinery is used. Ultimately (in future pydicom versions), decoding to native types (like arrays) should be done if possible when converting the raw data element - although the ambiguous types depend on other data elements, unfortunately, as do the shapes if not a 1-d array. So I think numpy arrays, Python lists or Python arrays of the appropriate type could be accepted on setting, as well as single values if needed. E.g. for OW an array of |
This is also my gut feeling, as I wrote above, though I think that this is currently not possible. Even wih the VM of 1 the O* VRs are semantically arrays of numbers, so I also think it should be possible to set them that way - though this is something that can be added separately later on. Also I think that for these types (both numeric and O*) validation makes sense only for writing, not for reading. In principle we could add a length check for some of them that should be a multiple of 4 or 8 (FL, FD, UV, OF, OD, OV), though I'm not sure if this is needed. |
One question, where I'm unsure: shall be allow to set bytes to a numerical VR? This is from a test: elem = DataElement(0x00100010, 'US', b'\x00') Should this be allowed, or should it issue a warning? I'm currently adding a check for these values, allowing only |
- also add type validation for validated VRs - see pydicom#1414
- also add type validation for validated VRs - see pydicom#1414
I suspect this may be used internally within pydicom. In any case, it has been allowed, so for backwards compatibility, that should still be the default, IMHO. |
Ok, I added this (with only a check for the length). |
* Add validation for numerical VR values - also add type validation for validated VRs - see #1414 * Fix handling of invalid type in length check - add more tests * Fix indentation * Also allow bytes for numeric values
As the PR got merged, I'm copying my comment here to avoid forgetting the question: For the time being I left out the O* VRs and AT, as I'm not completely sure yet how to handle them. First there is the issue of which types shall be supported by assignment, which is currently discussed in the issue - I want to have some agreement here first. Second - if we allow something like list or array for assignment, we could also check each single value for validity, but I find that too much of an overhead, so I would not want to do this - but I may be wrong here. |
I like just having |
The complexity of typing is a valid point, but I'd hate to sacrifice a 'natural' fit assignment for that reason alone. I really like the philosophy that encoding/decoding are done at I/O and in between you usually work with native Python types (and to me, numpy arrays are the native type for multivalue binary data). I could also see that someone might assign a value and then also do some kind of calculation with it later, so they would need it as numpy array again.
Numpy array items are already guaranteed to be of the same type. It looks like Python For python Perhaps a reasonable compromise is to at least allow numpy arrays for now? I expect that is what most people would use. If the user is not just carrying around |
I actually have been thinking around the same lines. Lists would be problematic to check, and Python arrays are not really used in this context, but numpy arrays are a natural choice here. If I understand @scaramallion correctly, he is saying that you can always convert a numpy array to bytes using |
Just to be clear, not actually at the assignment time, right? - only when encoding the dataset e.g. for file write. |
Actually I haven't thought about this... But you are right, it would be unexpected if the array would change after the assignment. A round trip should produce the same array. |
We should add a VR type check on setting element values to ensure they're valid. This should help users when an invalid value is used and with maintenance as it will make it easier to reason what types should be expected.
The text was updated successfully, but these errors were encountered: