If you want DQM to run on an automated schedule:
- Navigate to Cloud Workflows.
- Select the
dqm_trigger
workflow and click "Edit". - Add a new "Cloud Scheduler" trigger.
- If prompted, review the Cloud Scheduler API prompt and click "Enable".
- Give the trigger a name, and choose the region that matches your workflow.
- Determine an appropriate cron schedule for the chosen timezone, using the help tooltips.
- Click "Continue" and leave the workflow argument empty -
{}
. - Select "All calls" for the workflow call log level.
- Select the "DQM Service Account" (
tfvars
default:dqm-account@<project-id>.iam.gserviceaccount.com
). - Click "Next" and save the trigger.
- Click "Next" to proceed to the workflow definition page.
- Click "Deploy" to save and re-deploy the workflow with a trigger.
- DQM will now automatically run on a schedule.
If you want to run DQM manually:
Note:
If you are using DQM Webapp, you can visit Workflow
page in webapp to trigger or view all the workflow executions. You can also view entire log, in case of failure.
- Navigate to Cloud Workflows.
- Select the
dqm_trigger
workflow and click "Execute". - Leave the input empty -
{}
and select "All calls" for the log level. - Click "Execute" to start the DQM run.
- You can observe the workflow logs within the UI.
- Once completed, you can view the output logs in BigQuery.
Note:
If you are using DQM Webapp, you can visit Rule Violations
page in webapp to view all the logs for your provided project, dataset and log table name.
DQM outputs extensive logging, which can be leveraged for notifications or dashboards.
If you specify a log_table
, they're stored in BigQuery; otherwise, they go to Cloud Logging.
The logged fields are described below:
Required:
dqm_version_id
: DQM release versionworkflow_execution_id
: Cloud Workflow execution IDrun_timestamp_utc
: Timestamp (in UTC) of when DQM startedproject_id
: GCP source project IDdataset_id
: BigQuery source dataset IDtable_name
: BigQuery source table namefull_table_id
: Full BigQuery table ID (project_id.dataset_id.table_name
)log_type
: One of (system, parser, rule) depending on the errorcolumn
: Name of the column being processederror
: Error message provided for the issue
Nullable:
parser
: Name of the parser, when log_type is parserrule
: Name of the rule, when log_type is rulerule_params
: Arguments passed to the rule, when log_type is rulevalue
: Data value causing failure, when log_type is not system
Note:
Management of Notification or alerting
is yet available in Webapp.
By default, DQM deploys alerting policies that can be enabled during deployment by changing enable_notifications
in the tfvars
file. Policies can also be enabled/disabled individaully through the GCP UI in Monitories Policies. DQM deploys the following policies:
- [DQM] Workflow execution error: Triggered when an error/warning occurs in the Cloud Workflow
- [DQM] Cloud Function execution error: Triggered when an error occurs in the Cloud Function
- [DQM] Rule or Parser violations: Triggered when a rule is violated or parsing failure occured
If enables, e-mail alerts are send to the notification_email
in the tfvars
file with the notification_period
as minimum time in between notifications to prevent excessive alerting in the case of an error.