Support Advanced Data Provider API #300

cmoesel · 2022-11-04T19:02:56Z

Currently, the DataProvider (a.k.a. PatientSource) API provides two functions:

export interface DataProvider {
  currentPatient(): PatientObject | undefined | Promise<PatientObject | undefined>;
  nextPatient(): PatientObject | undefined | Promise<PatientObject | undefined>;
}

The PatientObject, in turn, has this API:

export interface PatientObject extends PatientObject{
  findRecords(profile: string | null, retrieveDetails?: RetrieveDetails): RecordObject[] | Promise<RecordObject[]>;

  // The rest of these fields are inherited, but listed here for completeness
  get(field: any): any;
  getId(): any;
  getCode(field: any): any;
  getDate(field: any): any;
  getDateOrInterval(field: any): any;
  _is?(typeSpecifier: AnyTypeSpecifier): boolean;
  _typeHierarchy?(): AnyTypeSpecifier[];
}

For a long time, findRecords only took one argument, a string identifier. The optional retrieveDetails argument was added recently to support some more advanced use cases -- but likely isn't implemented too broadly. In addition, _is and _typeHierarchy were only added in version 2.0, and done in a way to be backwards-compatible so existing data providers would not break.

We should introduce a v2 data provider API that is more intentional in its design (vs the current design that had features tacked on in later versions). This provider should allow implementations to optionally support other components of the retrieve (such as code filtering) or use other hints in the retrieve to perform narrower searches. This would potentially improve performance by allowing filtering to happen server-side (or in a database) instead of in the cql-execution code.

The Java CQL Engine provides a RetrieveProvider API (here) that currently supports a single retrieve function that takes many arguments. We might consider something like that, although I propose a different approach below:

// Notional Data APIs... VERY subject to change!

// A CQL data model can support multiple contexts. While "Patient" is the most
// common, there may be other contexts such as "Practitioner", "Encounter", or
// model-specific contexts. In addition, every data model should support the
// "Unfiltered" context. A ContextProvider declares the models that it supports
// and allows the engine to ask for specific contexts for evaluation.
export interface ContextProvider {
  supportedModels(): { name: string, version: string, url: string }[];
  getContext(context: string): ContextProvider;
}

// A context data provider handles the data for a given context. At a minimum,
// it should declare its name and provide an asynchronous iterator for
// iterating over its entities (for example, iterating patients in the
// "Patient" context). In order to support related context retrieves, it may
// also provide a way to find an entity by key and value (e.g., patient ID).
// NOTE: The "Unfiltered" context typically has a single entity that supports
// retrieving data from the global (unfiltered) context.
export interface ContextDataProvider {
  context(): string;
  entities(): AsyncGenerator<ContextEntity>;
  findEntity(key: string, value: any)
}

// A ContextEntity represents an entity that provides the basis for execution
// of the CQL statements in that context. Most contexts will have multiple
// entities. For example, in a "Patient" context, each entity represents a
// single patient against which the context statements should be executed.
// ContextEntities may also have their own properties and values.
export interface ContextEntity extends ModelData {
  retrieve(RetrieveDetails): Promise<RetrieveResult>;
}

// ModelData represents an arbitrary piece of data from a model. It must
// provide a unique id to support reporting results, logging, and debugging.
// The engine can get properties of the data via a "get" function. The optional
// getTypeHierarchy and isType functions support CQL operations such as "is"
// and "as".
export interface ModelData {
  getId(): string;
  get(field: any, type?: 'Code'|'DateOrInterval'): any | any[];
  getTypeHierarchy?(): AnyTypeSpecifier[];
  isType?(typeSpecifier: AnyTypeSpecifier): boolean;
}

// RetrieveResult not only returns the results, but also indicates how much of
// the RetrieveDetails it handled, as well as if any errors were encountered.
// Providing this additional information allows the engine to determine if it
// needs to perform any addition post-processing (like code filtering).
export interface RetrieveResult {
  handledTemplateId: boolean;
  handledCodeFilters: boolean;
  handledDateFilters: boolean;
  handledIncludes: boolean;
  results: RecordObject[];
  errors?: Error[];
}

This design definitely deserves discussion and iteration. Some open questions:

Should the engine support multiple ContextProviders in order to support CQL that uses multiple models? Or should it only allow a single ContextProvider and expect integrators to aggregate multiple ContextProviders into a single instance?
If the engine asks the ContextProvider for a context it does not support, should it throw or just return null?
Should a ContextDataProvider declare its capabilities (such as whether or not it can do code filtering)? On one hand, that would be helpful. On the other hand, it cannot be taken as a guarantee because there still could be specific value sets or code systems that it doesn't handle.
Is Entity the right word for the thing a Context iterates over?
How should a RetrieveResult indicate what it did and did not handle? I've initially designed it as a small set of booleans, bit it also could have been another object (with the same small set of booleans), or it could be a list of the retrieve properties it processed (and those not in the list are assumed to be unprocessed), or a list of the retrieve properties it couldn't process. Communication of what was and wasn't done is important, but there are so many ways to do it!
I don't remember why we had both _typeHierarchy and _is. If the engine has the type hierarchy, can't it use that to figure out is?
The type hierarchy is actually more an artifact of the model than the provider. If the provider gave the engine a copy of the model-info it implements, then the engine could use that -- and the only thing the data would need to report is its direct type. But is that worth the trouble?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Advanced Data Provider API #300

Support Advanced Data Provider API #300

cmoesel commented Nov 4, 2022

Support Advanced Data Provider API #300

Support Advanced Data Provider API #300

Comments

cmoesel commented Nov 4, 2022