Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Loses Harvest Object on WAF file Timestamp Change #324

Open
Jin-Sun-tts opened this issue Feb 13, 2024 · 0 comments
Open

Dataset Loses Harvest Object on WAF file Timestamp Change #324

Jin-Sun-tts opened this issue Feb 13, 2024 · 0 comments

Comments

@Jin-Sun-tts
Copy link

related issue: GSA/data.gov#4505

Summary:

When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.

The root cause is the the harvest_object_id does not change with the new harvest_object_id.
This was confirmed through the following API calls:
/api/action/package_show?id=<package_id>
/api/action/package_search?q=id:<package_id>

Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.

Observations from Testing:

  1. Manually run ckan search-index rebuild <package_id> resolved the issue, as the above API calls return correct value of harvest_object_id.

  2. Found the code block which should refresh the solr index:
    https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70

    Testing with the following code changes yielded positive results:
    Invoking package_update instead of package_index.index_package resolved the issue.
    OR
    Addition of model.Session.commit() before invoking package_index.index_package also resolved the issue.
    OR
    calling rebuild index instead of package_index.index_package does not solve the issue unless model.Session.commit() was called before invoking the rebuild.

It seems that the assumption that package_index.index_package doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.

Any alternative solutions to address this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant