Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce storage required for indexing - stop writing sp_name, res_type, and sp_updated to hfj_spidx_* tables #5937

Open
volodymyr-korzh opened this issue May 15, 2024 · 0 comments · May be fixed by #5941
Assignees

Comments

@volodymyr-korzh
Copy link
Collaborator

Provide a storage-optimized config option to avoid writing unused data to our hfj_spidx* index tables.

The tables that store indexing data (hfj_spidx_string, hfj_spidx_token, etc.) use a hashing strategy to combine the name of the search parameter (sp_name), the resource type (res_type) and represent them as a single 8 byte hash_identity value. The actual sp_name and res_type columns are never used by any queries. Combined with the sp_updated timestamp, represent 20-100 bytes of operationally unused storage for every index entry.

Acceptance Criteria

  • When the new storage-optimized indexing flag is on, the sp_name, res_type, and sp_updated columns should be empty.
  • All queries should still work.

Solution Design

  • Add a new config item StorageSettings - index-storage-optimized.
  • Migration to make the columns nullable.
  • When active, null sp_name, res_type and sp_updated when saving index rows.
  • Ensure that the index-row reuse only used the hash_exact for equality check.
  • Ensure that reindex still works.

Testing

  • Setup HAPI-FHIR server with default settings.
  • Create some resources with strings and identifiers. E.g. Patient with family name and identifier.
  • Verify that the database rows in hfj_spidx_string and hfj_spidx_token for that res_id have values in sp_name, res_type, and sp_updated.
  • Turn the new setting on.
  • Create another Patient with family name and identifier.
  • Verify that the database rows hfj_spidx_string and hfj_spidx_token for that res_id have null for sp_name, res_type, and sp_updated.
  • Run $reindex for the first Patient created (Patient/[id]/$reindex).
  • Verify that the rows from early now have null values for sp_name, res_type, and sp_updated.
  • Ensure that search still works for all values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant