Vs 299 spike speed up integration tests #8706

koncheto-broad · 2024-02-29T15:02:05Z

This was for a spike, so this draft PR is never going to be actually merged. But having it in this format will be helpful for whoever picks the work up next!

* VS-815: Add Support for YNG to VQSR Lite * Up the memory of a task in JointVcfFiltering.wdl. * Use 'HDD' rather than 'LOCAL' in JointVcfFiltering.wdl

* Update GvsCalculatePrecisionAndSensitivity.wdl to allow for different scale of calibration_sensitivity vs. lod score. Also retrieving score from JointVcfFiltering and storing that in BQ and in the VCF.

* deleted VDS * only one left

…tion of Delta (#8205) * Lees name * add vds validation script written by Tim * fix rd tim typo * make sure temp dir is set and not default for validate() * swap to consistent kebab case Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com> * clean up validation * put init in the right place * add proper example to notes * update code formatting --------- Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>

* Lees name * add vds validation script written by Tim * fix rd tim typo * make sure temp dir is set and not default for validate() * swap to consistent kebab case Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com> * clean up validation * put init in the right place * add proper example to notes * update code formatting * update review --------- Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>

* Don't run gatk tests when the only changes in a commit are in the scripts/variantstore directory.

* laying framework for FOFN bulk import code * adding in terra notebook utils code * updating wdl * updating environment variables to make this work better * quotey McBetterQuotes * extra environment variables * normalizing variable name with other wdls that require it * gotta explicitly set WORKSPACE_NAMESPACE to the google project as well. Apparently. * typoooooooooooooooooo * Didn't pipe the output files the entire way up * whoopsie * typo * two updates after testing: 1. We do NOT want to assume that the sample ids we want are in the name field. Pass that through as a parameter. 2. We want to explicitly pause every 500 samples, as that's our page size. It slows our requests down enough to not spam the backend server and hit 503 errors, although it does slow down the rate at which we can write the files if the dataset is too big. Which shouldn't be a concern, because as long as it doesn't cause errors it is still a hands off process. 3. We want to account to heterogenous data. In AoU Delta, for instance, the control samples keep their vcf and vcf_index data in a different field. This would cause the whole thing to fail if we weren't accounting for that explicitly, and now we generate an errors.txt file that will hold the row that we couldn't find the correct columns for so they can be examined later * silly mistake copying the functioning code over from the workbook * making script more robust against specifying imaginary columns in the data table and being slightly more informative in the output of the python script * increasing the size of the disk this is running on for the sake of efficiency (and handling larger callsets) * Passing errors up * update params * short term testing (rate lim) * make it only 25 shards! * add workspace id scraping * add workspace id scraping fixup * this is not functioning--need to curl in the wdl * clean up vcfs so we dont run out of space * add duplicates test to the shard loading * clean up namespace prep --------- Co-authored-by: Aaron Hatcher <hatcher@broadinstitute.org>

* Use the annotation 'AS_MQ' for indels.

…0] (#8274)

… table (#8278) * Remove the unneeded SCORE field from the filter_set_info_vqsr table * Updated the docker images.

* add queries for testing mismatched sites and variants across possible duplicates * still need to wire these through * plumb thru dup validation * dockstore for testing * update docker * add xtrace * better bool logic * clean up bash * okay lets try ripping shit out to get this to work * okay lets put a few lines back * ok that worked, lets swap for better errors * short term remove clinvar * review changes * update docker * explain removal of clinvar test

* Adding tests for ExtractCohortLite.

* Simple fix to have the header of the VAT tsv to use tab characters.

* Updated to latest version of VQSR Lite (from Master) * Ported tests and files for VQSR Lite over * Refactored VQSR Classic code into its own WDL

) * Add support for VQSR Lite to GvsExtractCohortFromSampleNames.wdl * Remove obsolete gatk override jar

* add a brief quota request template doc * link quota request template * add header stream info * becs suggestions * discuss load_data_batch * header info * becs formatting improvements * add aarons calculations notes

* Add task to deduplicate the VAT table.

…emory or disk on certain task. (#8704)

rsasch and others added 30 commits February 16, 2023 15:30

Document VCF generation [VS-795] (#8202)

4a1c203

Variants GATK Docker image building docs + script [VS-827] (#8207)

1ce13b3

Update GATK jar used in GvsJointVariantCalling WDL (#8216)

fdbaa14

Hello Azure SQL Database from Cromwell on Azure [VS-812] (#8220)

0014005

Remove what appear to be accidentally added files [VS-834] (#8225)

6f747d0

VS-815: Add Support for YNG to VQSR Lite (#8206)

6d41adf

* VS-815: Add Support for YNG to VQSR Lite * Up the memory of a task in JointVcfFiltering.wdl. * Use 'HDD' rather than 'LOCAL' in JointVcfFiltering.wdl

Disentangle non-GVS code from GVS code [VS-834] (#8229)

3a7f6e2

VS-695. Updates to run Precision and Sensitivity on VQSR Lite (#8230)

5645e88

* Update GvsCalculatePrecisionAndSensitivity.wdl to allow for different scale of calibration_sensitivity vs. lod score. Also retrieving score from JointVcfFiltering and storing that in BQ and in the VCF.

Track avro export costs [VS-769] (#8236)

781bb14

Add note that we deleted a VDS! (#8214)

b09d909

* deleted VDS * only one left

Add a test exclusion for gvs scripts (#8250)

bb6806b

* Don't run gatk tests when the only changes in a commit are in the scripts/variantstore directory.

Intro to Cosmos Spike [VS-845] (#8254)

3880421

Update variants base image [VS-866] (#8262)

5e19ec0

Rename tieout WDL input to not match name of output [VS-860] (#8265)

2dd76f7

VS-849 - Use the annotation 'AS_MQ' for indels. (#8261)

529b078

* Use the annotation 'AS_MQ' for indels.

Tidy up ExtractCohortTest so it can be run in IntelliJ cleanly [VS-86…

8217073

…0] (#8274)

VS-838. Remove the unneeded SCORE field from the filter_set_info_vqsr…

23a64a7

… table (#8278) * Remove the unneeded SCORE field from the filter_set_info_vqsr table * Updated the docker images.

VS-883 - Add tests for extract cohort lite (#8284)

4ab6bde

* Adding tests for ExtractCohortLite.

Fix broken gsutil in Variants Docker image [VS-888] (#8289)

0f24625

Adding the uber_monitor.py script (#8268)

ce3a5c7

VS-885 - Fix VAT TSV (#8286)

dc8b800

* Simple fix to have the header of the VAT tsv to use tab characters.

Sanity check variantstore images before publishing [VS-889] (#8291)

17afee4

VS-776. Update to latest version of VQSR Lite. (#8269)

f4a355c

* Updated to latest version of VQSR Lite (from Master) * Ported tests and files for VQSR Lite over * Refactored VQSR Classic code into its own WDL

Azure SQL Database Ingest [VS-879] (#8293)

a2ffeb8

VS-895. Add VQSR lite support to 'extract cohort by sample names' (#8298

daeae13

) * Add support for VQSR Lite to GvsExtractCohortFromSampleNames.wdl * Remove obsolete gatk override jar

Move monitoring script to public bucket [VS-908] (#8303)

fc2c7f7

RoriCremer and others added 30 commits February 20, 2024 13:17

VS 1036 documenting quotas (#8649)

764fbd2

* add a brief quota request template doc * link quota request template * add header stream info * becs suggestions * discuss load_data_batch * header info * becs formatting improvements * add aarons calculations notes

Headers only ingest [VS-1187] (#8674)

4a51ee0

adding timestamps for efficient storage

da4f056

making debugging slightly easier

3098ff5

making debugging slightly easier v2

040bf21

making debugging slightly easier v2.5

936d031

changes to reuse existing tables

b50b0d5

changes to reuse existing tables v2

80f9d03

bash syntax issues are fun

98a062a

oops, used wrong variable substitution

66fdc9f

sloppy, sloppy

89f09a6

oops, break -> continue

bbe53c1

Attempting a real callset to try it for real this time

1d8d61c

dockstore

b68614f

keep wdl task alias nice and distinct

3827e68

wdl not very smart

8e236ae

wdl task sequencing Is A Thing

48854cf

wdl task sequencing Is A Thing Part 2: Sequential Boogaloo

695d57e

testing snapshot restoration

e4f86b0

silly bash error

1f2e479

testing results comparison

2b6c529

tweaking output for comparison

7199f69

VS-1129 Remove Duplicates from VAT upon creation. (#8700)

c34adc2

* Add task to deduplicate the VAT table.

add link to README (#8705)

bd09760

Added a note to AoU documentation describing possible need for more m…

0fc84a5

…emory or disk on certain task. (#8704)

parallel run experiment

38a4be3

typo in dockstore

8896475

changing name of parallel task

e897ce8

Merge branch 'ah_var_store' into VS-299-spike-speed-up-integration-tests

eb26750

pushing schema of table_mappings table

fb18c3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vs 299 spike speed up integration tests #8706

Vs 299 spike speed up integration tests #8706

koncheto-broad commented Feb 29, 2024

Vs 299 spike speed up integration tests #8706

Are you sure you want to change the base?

Vs 299 spike speed up integration tests #8706

Conversation

koncheto-broad commented Feb 29, 2024