Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nightly test with realistic data, performance measurements #8142

Open
wardi opened this issue Apr 2, 2024 · 1 comment
Open

nightly test with realistic data, performance measurements #8142

wardi opened this issue Apr 2, 2024 · 1 comment

Comments

@wardi
Copy link
Contributor

wardi commented Apr 2, 2024

Let's improve the create-test-data command to generate a more typical count of users, groups, orgs and datasets. It should also upload generated test data with tens of columns and thousands of rows. Next we should create a benchmark-test-data command to exercise the UI and APIs to display, sort and query the generated data.

These commands should have an option to generate a detailed report with the time for each each creation or query task.

In our nightly build job we can collect these reports and add them to a github pages static site repo, along with the commit id and pip freeze output, to track performance for these realistic workloads over time similar to https://speed.pypy.org/

This automatic reporting will help us identify changes to ckan's code, dependencies and environment that help or hurt performance.

@jqnatividad
Copy link
Contributor

jqnatividad commented Apr 4, 2024

Instead of synthetic test data, we should snapshot real-world data from well-known sources, e.g.:

  • World Bank
  • UN
  • NYC's 311 and Taxi Data
  • Boston's CKAN Organizations
  • Canada's Open Data Portal
  • non-English content from other CKAN Sites (Saudi Arabia, Singapore, Japan, Africa, Argentina, Finland, etc.)

The sample data snapshot should be curated so that it can exercise CKAN subsystems (e.g. different data types, date formats, UTF-8 encoding, Languages, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants