nightly test with realistic data, performance measurements #8142

wardi · 2024-04-02T14:47:23Z

Let's improve the create-test-data command to generate a more typical count of users, groups, orgs and datasets. It should also upload generated test data with tens of columns and thousands of rows. Next we should create a benchmark-test-data command to exercise the UI and APIs to display, sort and query the generated data.

These commands should have an option to generate a detailed report with the time for each each creation or query task.

In our nightly build job we can collect these reports and add them to a github pages static site repo, along with the commit id and pip freeze output, to track performance for these realistic workloads over time similar to https://speed.pypy.org/

This automatic reporting will help us identify changes to ckan's code, dependencies and environment that help or hurt performance.

The text was updated successfully, but these errors were encountered:

jqnatividad · 2024-04-04T13:13:20Z

Instead of synthetic test data, we should snapshot real-world data from well-known sources, e.g.:

World Bank
UN
NYC's 311 and Taxi Data
Boston's CKAN Organizations
Canada's Open Data Portal
non-English content from other CKAN Sites (Saudi Arabia, Singapore, Japan, Africa, Argentina, Finland, etc.)

The sample data snapshot should be curated so that it can exercise CKAN subsystems (e.g. different data types, date formats, UTF-8 encoding, Languages, etc.)

wardi added the Good for Contribution label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nightly test with realistic data, performance measurements #8142

nightly test with realistic data, performance measurements #8142

wardi commented Apr 2, 2024 •

edited

jqnatividad commented Apr 4, 2024 •

edited

nightly test with realistic data, performance measurements #8142

nightly test with realistic data, performance measurements #8142

Comments

wardi commented Apr 2, 2024 • edited

jqnatividad commented Apr 4, 2024 • edited

wardi commented Apr 2, 2024 •

edited

jqnatividad commented Apr 4, 2024 •

edited