Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BigQuery support #310

Open
marchakov-ody opened this issue Nov 6, 2019 · 2 comments
Open

Improve BigQuery support #310

marchakov-ody opened this issue Nov 6, 2019 · 2 comments

Comments

@marchakov-ody
Copy link

We are working on PEDSnet ETL in GCP, and trying to see if it's possible to run DQA in BigQuery. We managed to run the tool using bigrquery package; however, most of Level 2 reports return vague "Invalid query" error, so we would like some insight from the contributors to understand the inner workings of the tool.
Documentation lists DatabaseConnector and SQLRender as dependencies, and there is already some work in progress on BigQuery support for these modules. Are they actually used anywhere in DQA code?

Any help/feedback is appreciated.

@callahanc5
Copy link
Contributor

Hi @marchakov-ody ,

DatabaseConnector and SQLRender are no longer used in this package.

I am not familiar with bigrquery, but I would be happy to help debug this with you. Do any of the level 2 reports complete? Do you know which checks may be causing this? It may be that a single/few level 2 checks are causing it to fail.

Let me know and thank you for your interest in our DQA package.

@marchakov-ody
Copy link
Author

Hi @callahanc5,

Thanks for your reply. Out of level 2 reports, only measurement_organism was completed. The rest returned various error messages. We're investigating the possible causes, but an example log with errors is in the attachment:
dqa.log

We also had problems with some level 1 reports. I managed track one of the errors for Drug Exposure down to retrive_dataframe_join_clause_group function, but still getting more errors in other checks. I am not even sure if this is some sort of conflict with bigrquery, or just some of the packages like dplyr acting weird.

While we are trying to investigate in on our side, would you kindly suggest the best way to communicate to you/other contributors about these issues?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants