Add VR type check when element values are being set #1414

scaramallion · 2021-06-16T21:18:38Z

We should add a VR type check on setting element values to ensure they're valid. This should help users when an invalid value is used and with maintenance as it will make it easier to reason what types should be expected.

mrbean-bremen · 2022-04-19T18:31:27Z

This is at leastly partly implemented by #1543. @scaramallion - I'm not sure what remains to do here, can you have a look?

scaramallion · 2022-04-19T22:23:22Z

I think originally I created the issue because a user was confused between supplying a raw bytes value for one of the longer O* VRs and supplying the decoded value.

It's also something I sometimes do, particularly with SH and integer VRs like US. I think it'd still be a nice enhancement to cover all VRs when setting new values.

mrbean-bremen · 2022-04-20T18:30:51Z

As far as I understand, you mean validation for non-string VRs, which is currently not done - this would make sense.

I would say that for all numeric VRs (US, UL, UV, SS, SL, SV, FL, FD) only numbers in the correct range shall be allowed, for O* and AT probably arrays of these numbers or byte arrays (at least for OB, OW, OL, OV and maybe AT), but never strings.

I think originally I created the issue because a user was confused between supplying a raw bytes value for one of the longer O* VRs and supplying the decoded value.

Is this about assigning (decoded) string values instead of the raw byte values?

It's also something I sometimes do, particularly with SH and integer VRs like US.

Not sure about SH - this is a string VR and should already be validated. Can you elaborate on this?

scaramallion · 2022-04-20T23:45:18Z

This was the original issue

You're right about the SH/string VRs, that should get caught now.

mrbean-bremen · 2022-04-21T17:42:18Z

Ok, so you are basically saying that all O* values should always get raw bytes values, because they have a VM of 1, while the numerical VRs could get arrays of numbers (for VM > 1). One could argue that it would still be convenient to set a float array for an OF value, but the best way is probably to use a numpy float array and convert it to a byte array, so I guess your approach makes sense.

I will see if I can put something together, maybe over the weekend.

darcymason · 2022-04-21T18:44:27Z

Ok, so you are basically saying that all O* values should always get raw bytes values, because they have a VM of 1

I'm not as deep in this as you two are, but I'd like to raise the principle that all native Python types that make sense should be accepted, as well as raw bytes.

My recollection is that raw bytes on setting are accepted almost as a side-effect because that is what is done when reading from file, and the same machinery is used. Ultimately (in future pydicom versions), decoding to native types (like arrays) should be done if possible when converting the raw data element - although the ambiguous types depend on other data elements, unfortunately, as do the shapes if not a 1-d array.

So I think numpy arrays, Python lists or Python arrays of the appropriate type could be accepted on setting, as well as single values if needed. E.g. for OW an array of ints should be okay, if they are checked to be within the correct number of bits when strict checking.

mrbean-bremen · 2022-04-21T19:09:38Z

This is also my gut feeling, as I wrote above, though I think that this is currently not possible. Even wih the VM of 1 the O* VRs are semantically arrays of numbers, so I also think it should be possible to set them that way - though this is something that can be added separately later on.

Also I think that for these types (both numeric and O*) validation makes sense only for writing, not for reading. In principle we could add a length check for some of them that should be a multiple of 4 or 8 (FL, FD, UV, OF, OD, OV), though I'm not sure if this is needed.

mrbean-bremen · 2022-04-24T10:47:15Z

One question, where I'm unsure: shall be allow to set bytes to a numerical VR? This is from a test:

elem = DataElement(0x00100010, 'US', b'\x00')

Should this be allowed, or should it issue a warning? I'm currently adding a check for these values, allowing only int values in the correct range, but maybe I shall also allow bytes... (by "allowing" I mean that a warning is otherwise issued if using the default config)

- see pydicom#1414

- also add type validation for validated VRs - see pydicom#1414

darcymason · 2022-04-25T16:10:01Z

One question, where I'm unsure: shall be allow to set bytes to a numerical VR? This is from a test:
elem = DataElement(0x00100010, 'US', b'\x00')
Should this be allowed, or should it issue a warning? I'm currently adding a check for these values, allowing only int values in the correct range, but maybe I shall also allow bytes... (by "allowing" I mean that a warning is otherwise issued if using the default config)

I suspect this may be used internally within pydicom. In any case, it has been allowed, so for backwards compatibility, that should still be the default, IMHO.

mrbean-bremen · 2022-04-25T19:54:36Z

Ok, I added this (with only a check for the length).

* Add validation for numerical VR values - also add type validation for validated VRs - see #1414 * Fix handling of invalid type in length check - add more tests * Fix indentation * Also allow bytes for numeric values

mrbean-bremen · 2022-05-01T18:29:29Z

As the PR got merged, I'm copying my comment here to avoid forgetting the question:

For the time being I left out the O* VRs and AT, as I'm not completely sure yet how to handle them. First there is the issue of which types shall be supported by assignment, which is currently discussed in the issue - I want to have some agreement here first. Second - if we allow something like list or array for assignment, we could also check each single value for validity, but I find that too much of an overhead, so I would not want to do this - but I may be wrong here.

scaramallion · 2022-05-01T21:31:54Z

I like just having bytes for O*, our typing is already pretty complex and encoding values isn't difficult. It also matches the read type.

darcymason · 2022-05-02T16:51:26Z

I like just having bytes for O*, our typing is already pretty complex and encoding values isn't difficult

The complexity of typing is a valid point, but I'd hate to sacrifice a 'natural' fit assignment for that reason alone. I really like the philosophy that encoding/decoding are done at I/O and in between you usually work with native Python types (and to me, numpy arrays are the native type for multivalue binary data).

I could also see that someone might assign a value and then also do some kind of calculation with it later, so they would need it as numpy array again.

if we allow something like list or array for assignment, we could also check each single value for validity

Numpy array items are already guaranteed to be of the same type. It looks like Python arrays are too - I haven't really worked with them. We can fairly easily do the tobytes method on behalf of the pydicom user.

For python lists, I agree those might be a pain. We could force people to convert to numpy array, but then we could also do that quite easily, which would automatically enforce the item types.

Perhaps a reasonable compromise is to at least allow numpy arrays for now? I expect that is what most people would use. If the user is not just carrying around bytes then they would arise from some kind of calculation, no doubt done in numpy.

mrbean-bremen · 2022-05-02T18:20:42Z

Perhaps a reasonable compromise is to at least allow numpy arrays for now? I expect that is what most people would use. If the user is not just carrying around bytes then they would arise from some kind of calculation, no doubt done in numpy.

I actually have been thinking around the same lines. Lists would be problematic to check, and Python arrays are not really used in this context, but numpy arrays are a natural choice here. If I understand @scaramallion correctly, he is saying that you can always convert a numpy array to bytes using tobytes, so that would not be a big deal fo the user - but we could do this ourselves, too, if a numpy array is assigned.

darcymason · 2022-05-02T19:03:10Z

you can always convert a numpy array to bytes using tobytes, so that would not be a big deal fo the user - but we could do this ourselves, too, if a numpy array is assigned.

Just to be clear, not actually at the assignment time, right? - only when encoding the dataset e.g. for file write.

mrbean-bremen · 2022-05-02T19:33:18Z

Just to be clear, not actually at the assignment time, right? - only when encoding the dataset e.g. for file write.

Actually I haven't thought about this... But you are right, it would be unexpected if the array would change after the assignment. A round trip should produce the same array.

scaramallion added the enhancement label Jun 16, 2021

hackermd mentioned this issue Nov 16, 2021

Checks that values satisfy requirements of the VR ImagingDataCommons/highdicom#136

Open

mrbean-bremen added a commit to mrbean-bremen/pydicom that referenced this issue Apr 24, 2022

Add validation for numerical VR values

392360a

- see pydicom#1414

mrbean-bremen added a commit to mrbean-bremen/pydicom that referenced this issue Apr 24, 2022

Add validation for numerical VR values

a621839

- also add type validation for validated VRs - see pydicom#1414

mrbean-bremen mentioned this issue Apr 24, 2022

[MRG] Add validation for numerical VR values #1640

Merged

5 tasks

mrbean-bremen added a commit to mrbean-bremen/pydicom that referenced this issue Apr 24, 2022

Add validation for numerical VR values

cfcf437

- also add type validation for validated VRs - see pydicom#1414

darcymason added this to the v3.0 milestone Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VR type check when element values are being set #1414

Add VR type check when element values are being set #1414

scaramallion commented Jun 16, 2021 •

edited

mrbean-bremen commented Apr 19, 2022

scaramallion commented Apr 19, 2022 •

edited

mrbean-bremen commented Apr 20, 2022 •

edited

scaramallion commented Apr 20, 2022

mrbean-bremen commented Apr 21, 2022

darcymason commented Apr 21, 2022

mrbean-bremen commented Apr 21, 2022 •

edited

mrbean-bremen commented Apr 24, 2022 •

edited

darcymason commented Apr 25, 2022

mrbean-bremen commented Apr 25, 2022

mrbean-bremen commented May 1, 2022

scaramallion commented May 1, 2022 •

edited

darcymason commented May 2, 2022

mrbean-bremen commented May 2, 2022

darcymason commented May 2, 2022

mrbean-bremen commented May 2, 2022

Add VR type check when element values are being set #1414

Add VR type check when element values are being set #1414

Comments

scaramallion commented Jun 16, 2021 • edited

mrbean-bremen commented Apr 19, 2022

scaramallion commented Apr 19, 2022 • edited

mrbean-bremen commented Apr 20, 2022 • edited

scaramallion commented Apr 20, 2022

mrbean-bremen commented Apr 21, 2022

darcymason commented Apr 21, 2022

mrbean-bremen commented Apr 21, 2022 • edited

mrbean-bremen commented Apr 24, 2022 • edited

darcymason commented Apr 25, 2022

mrbean-bremen commented Apr 25, 2022

mrbean-bremen commented May 1, 2022

scaramallion commented May 1, 2022 • edited

darcymason commented May 2, 2022

mrbean-bremen commented May 2, 2022

darcymason commented May 2, 2022

mrbean-bremen commented May 2, 2022

scaramallion commented Jun 16, 2021 •

edited

scaramallion commented Apr 19, 2022 •

edited

mrbean-bremen commented Apr 20, 2022 •

edited

mrbean-bremen commented Apr 21, 2022 •

edited

mrbean-bremen commented Apr 24, 2022 •

edited

scaramallion commented May 1, 2022 •

edited