Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: add better validation on workflows #681

Closed
josephjclark opened this issue May 1, 2024 · 7 comments · Fixed by #696
Closed

CLI: add better validation on workflows #681

josephjclark opened this issue May 1, 2024 · 7 comments · Fixed by #696
Assignees

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented May 1, 2024

When loading a workflow JSON, the CLI should do more validation work, warning users when a hand-made workflow is incorrect.

The runtime actually has a validate-plan.ts file - it should be acceptable to just add extra validation rules here. They should bubble up to the CLI (for example, a workflow.json which specifies a start step not defined in the workflow will throw a validation error, ie { options: { start: "x" }, workflow: { steps: [ { id: "a" }] } }

Validation rules should include:

  • Check the overall structure: a workflow must have a workflow key at the root, and the workflow must have a steps key, which should be an array.
  • Log a warning if the steps array is empty
  • A step with an adaptor should also have an expression (steps without expressions are fine)
  • A step should only have the valid set of keys defined - any keys not found in the type reference should trigger a fail.
  • Log a warning if any unrecognised options are a passed on the options object

Plus anything else that may be useful

All validation rules should be thoroughly unit tested.

A big risk is that adding extra validation will break unit tests, which often delight in using invalid workflows. I think there's already a skipValidation option which we can pass to the runtime and we may need to add this. If this is breaking lots and lots and lots of tests we may want to re-think (perhaps validation should be opt-in, and we need to change the CLI to always pass a validate: true flag)

@josephjclark josephjclark added the good first issue Good for newcomers label May 15, 2024
@SatyamMattoo
Copy link
Contributor

SatyamMattoo commented May 19, 2024

Hey @josephjclark , While working on the issue, I came across a test in runtime.test.ts that includes data in the workflow steps. I'm a bit confused about this, as the types of Step | Job | Trigger do not include data. Typically, the data should be included inside the state.

Screenshot 2024-05-19 160641

To address this, I've considered adding more validation functions to thevalidate-plan.tsfile. My approach includes:

  1. Validation for the workflow
 const assertWorkflowStructure = (plan: ExecutionPlan) => {
  const { workflow, options } = plan;

  if (!workflow || typeof workflow !== 'object') {
    throw new ValidationError('Missing or invalid workflow key in execution plan.');
  }

  if (!Array.isArray(workflow.steps)) {
    throw new ValidationError('The workflow.steps key must be an array.');
  }

  if (workflow.steps.length === 0) {
    console.warn('Warning: The workflow.steps array is empty.');
  }

  workflow.steps.forEach((step, index) => {
    assertStepStructure(step, index);
  });

  if (options) {
    assertOptionsStructure(options);
  }
}; 
  1. Validation for the steps and the options:
const assertStepStructure = (step: Step | Job | Trigger, index: number) => {
  const allowedKeys = ['id', 'name', 'next', 'previous', 'adaptor', 'expression', 'state', 'configuration', 'linker'];

  Object.keys(step).forEach((key) => {
    if (!allowedKeys.includes(key)) {
      throw new ValidationError(`Invalid key "${key}" in step ${step.id || index}.`);
    }
  });
};

const assertOptionsStructure = (options: WorkflowOptions) => {
  const allowedKeys = ['timeout', 'stepTimeout', 'start', 'end', 'sanitize'];

  Object.keys(options).forEach((key) => {
    if (!allowedKeys.includes(key)) {
      console.warn(`Warning: Unrecognized option "${key}" in options object.`);
    }
  });
};

Also, I had a doubt about this statement "A step with an adaptor should also have an expression (steps without expressions are fine)". This basically means if the step has an adaptor, it should have an expression as well, but if an adaptor is missing there is no need for an expression. Have I interpreted it correctly?

This approach does not break any tests for the runtime except for the one mentioned. Could you please review this and let me know if this aligns with what you had in mind? Additionally, do you have any suggestions or improvements for this approach?

@josephjclark
Copy link
Collaborator Author

Hi @SatyamMattoo ,

That data property looks like an old thing that got missed when refactoring code. Please feel free to update the test!

See these validations implemented in code is making me a little anxious. I'd like to think about this a bit more after I've had my coffee. I'm worried we're being too strict - maybe this should go into the CLI, not the core runtime. I'm not sure - let me think it over.

if the step has an adaptor, it should have an expression as well, but if an adaptor is missing there is no need for an expression.

This is correct!

@SatyamMattoo
Copy link
Contributor

Thank you @josephjclark, I'll update the test accordingly.

Regarding the validations in the code, I completely understand your concern. If you have any further thoughts or suggestions, I will be more than happy to help.

@Pin4sf
Copy link

Pin4sf commented May 20, 2024

@josephjclark I am also looking into the issue and I think the approach provided by @SatyamMattoo of Custom Validation Functions is correct way to go as here we are validating the structure of the workflow, steps, and options in the execution plan.

I'm worried we're being too strict

Your concern is also valid so I think we can iterate over other approach which can be :-

  • Declarative Schema Validation: We can define a schema for our workflow using a library like Joi or Yup. The schema would define the expected structure of the workflow, including the types and structure of each field. It would then validate the workflow against this schema. This approach is declarative and can provide detailed error messages.

@josephjclark
Copy link
Collaborator Author

@SatyamMattoo can you open a PR with your work so far please?

@Pin4sf We heavily use JSON schema across our stack - if we were to introduce a schema here, that would be the way to go. But we already have a formal type declaration in TypeScript. I'd rather use that than duplicate it,

@Dharssini
Copy link

Dharssini commented May 21, 2024

@josephjclark I am currently working on this issue under C4GT, will raise a PR shortly

@josephjclark
Copy link
Collaborator Author

@Dharssini This is not a C4GT DMP issue. If you want to apply for one of those you have to apply to the portal directly.

This issue is being worked on by @SatyamMattoo and does not require more input. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants