ckanext-activity: performance improvements #8169
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #
#7953
Problem: Dashboards take over 180seconds to load for a logged in normal user, emailing takes over 60minutes for less than 60 users getting notifications due to: high cpu/disk utilization on postgres on inefficient data heavy queries.
Steps to reproduce:
Have a couple of datasets which have been updating every 15min for the last 10 years creating 'activity' diff's. Have about 2.1million rows in activity table with
data
blob field fully populated.Create default user and
follow
datasets (or the org). Go and visit dashboard and see how long it takes to load. Go and trigger email notifications on updated dataset/resources.example org activity stream: https://www.data.qld.gov.au/organization/environment-science-and-innovation
Proposed fixes:
Speed up activity stream loading
ckanext-activity plugin:
These changes have been deployed and tested in www.data.qld.gov.au under commit qld-gov-au@88d932e which is on top of ckan 2.10.4 . This has allowed us to reenable email notifications hourly as this was too much of a database cpu+disk and batch layer cron overlaps since we moved from 2.9 to 2.10 and the activity table 'data' column started to be fleshed out with json blob data of the point in time history.
Features:
Please [X] all the boxes above that apply
Co-Author: @ThrawnCA