Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatize multiple fields in scheming #364

Open
dsanjurjo opened this issue Mar 24, 2023 · 3 comments
Open

Automatize multiple fields in scheming #364

dsanjurjo opened this issue Mar 24, 2023 · 3 comments

Comments

@dsanjurjo
Copy link

dsanjurjo commented Mar 24, 2023

Today if you want to create a multiple field in scheming and do faceting on it, you have to define the field in the solr schema and create a IPackageController Singleton with a custom "before_index" method to handle values before send them to solr when indexing (basically convert a json array in a python array).

IMHO this could be easy if we determine (in this very extension) a prefix for multivalued fields ("multival_", e.g.), and use it to create a dinamic field in solr, some as ' <dynamicField name="multival_*" type="string" indexed="true" stored="true" docValues="true" multiValued="true" /> '. We have to create a generic way in a before_index method to handle those fields (all with names beginning with "multival_") too in the extension, but not a big deal.

This way, if extension users define a "multival_" named field in their scheming file (yaml or json), it will be treated as a multivalued field and faceted acordly otb.

I know the current way is not a big deal, but there is a lot of questions of users on faceting multivalued fields, and with these changes faceting will be easier with this extension.

@FrantisekPavlicek
Copy link

Hello, I'm actually still struggling with this on my CKAN site. Could you please write down first, what is the current solution for correctly displaying facets with multiple fields?

CKAN version: 2.10.0. There have been some changes in naming the methods, but renaming the before_index method to before_dataset_index method actually did not help. Also, currently I cannot rebuild the search index, it gives back the line: TypeError("before_dataset_index() missing 1 required positional argument: 'data_dict'")

@wardi
Copy link
Contributor

wardi commented Mar 27, 2023

If we recommend all multi-valued fields be called multival_* then that this leaks into the API for datasets on the site.

A better solution would be to add dynamic solr schema support to ckan, and have ckanext-scheming use that to properly index all metadata fields a site might have. Solr can add and remove fields with its API. Using static schemas with solr is very much the old way of operating.

Dynamic schemas in ckan would also make it easier to support swapping out solr for a different search back-end.

There has been some discussion on the ckan repo about adding this feature, are you interested in helping out?

@dsanjurjo
Copy link
Author

The solution proposed by Wardi could be the perfect answer, but it needs a lot of work to implement, and this will be not backwards compatible. What I´ve suggested is a provisional answer to this problem that could be implemented in a while.

I´m not sure what the problem with the multival_* solution is. Wardi, you said that this will leak to the API, but I´m not sure what exactly is the problem . Please apologyze me, I'm just a beginner with ckan (and a poor English speaker).

FrantisekPavlicek, sorry. I´m struggling with ckan 2.9, and I don't know the new methods in 2.10 (and barelly in 2.9 too!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants