Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validators ignored for fields with repeating subfields #394

Open
themowski opened this issue Dec 5, 2023 · 0 comments
Open

Validators ignored for fields with repeating subfields #394

themowski opened this issue Dec 5, 2023 · 0 comments

Comments

@themowski
Copy link
Contributor

Environment

CKAN version: 2.10.1
ckanext-scheming version: release-3.0.0

Description

Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called custom_fields that uses repeating_subfields to allow users to specify additional key-value pairs to associate with their dataset. The key subfield has a validator that makes sure that the key's name meets certain requirements:

- field_name: custom_fields
  label: Custom Fields
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

We want to block users from specifying the same key value in multiple entries in custom_fields. The obvious way to do this is to define a validator function, keys_are_unique(), that can be applied to custom_fields to verify that the keys are unique.

- field_name: custom_fields
  label: Custom Fields
  validators: keys_are_unique
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from list[str] to dict. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.

One example of this behavior is in the _field_validators() function in ckanext/scheming/plugins.py:

def _field_validators(f, schema, convert_extras):
"""
Return the validators for a scheming field f
"""
if 'validators' in f:
validators = validation.validators_from_string(
f['validators'],
f,
schema
)
elif helpers.scheming_field_required(f):
validators = [not_empty]
else:
validators = [ignore_missing]
if convert_extras:
validators.append(convert_to_extras)
# If this field contains children, we need a special validator to handle
# them.
if 'repeating_subfields' in f:
validators = {
sf['field_name']: _field_validators(sf, schema, False)
for sf in f['repeating_subfields']
}
return validators

As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for repeating_subfields and/or validators to explain that if a field has repeating_subfields, then validators for that field are ignored in favor of validators defined for the subfields.

I also have two specific questions, any insight would be appreciated:

  1. We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a ckanext-scheming schema. Is our current approach (custom_fields field with repeating_subfields) the "correct" way to do what we are trying to do, or is there another method that is better supported, either by ckanext-scheming or by CKAN itself?

  2. Assuming that what we are doing is correct, roughly how difficult would it be to modify ckanext-scheming to support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant