Module/ichorcna/1.0 #178

Jwong684 · 2021-04-01T01:31:14Z

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for New Module

Required

If applicable

I added more granular output subdirectories.
I added rules to the reference_files workflow to generate any new reference files.
I added subdirectories with large intermediate files to the list of scratch_subdirectories in the default.yaml configuration file.
I updated the list of available wildcards for the input files in the default.yaml configuration file.

Checklist for Updated Module

To be completed.

…o module/ichorcna/1.0

rdmorin

One big thing I'd like to raise is why the source code for ichorcna is seemingly bundled with this module. This should auto-install rather than containing all the source code.

rdmorin · 2021-04-28T16:51:33Z

demo/Snakefile

@@ -83,4 +84,5 @@ rule all:
        rules._strelka_all.input,
        rules._bwa_mem_all.input,
        rules._liftover_all.input,
-        rules._controlfreec_all.input
+        rules._controlfreec_all.input,
+        rules._ichorcna_all.input


add a newline after this

rdmorin · 2021-04-28T16:51:50Z

demo/config.yaml

+    ichorcna:
+        inputs:
+            sample_bam: "data/{sample_id}.bam"
+            sample_bai: "data/{sample_id}.bam.bai"


Add a newline at the end if this is the last line

rdmorin · 2021-04-28T16:52:35Z

modules/ichorcna/1.0/config/default.yaml

+            readcounter:
+                readCounterScript: "{MODSDIR}/src/readCounter"
+                chrs:
+                    hg19: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y"


Ideally we wouldn't rely on the user to specify the chromosomes this way but I can live with this for now.

rdmorin · 2021-04-28T16:53:13Z

modules/ichorcna/1.0/config/default.yaml

+                    hs37d5: "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y"
+                    hg38: "chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY"
+                    grch38: "chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY"
+                qual: 20


Explain with a comment what this does.
e.g.
qual: 20 #set the minimum mapping quality (or whatever this actually means)

rdmorin · 2021-04-28T16:54:09Z

modules/ichorcna/1.0/config/default.yaml

+                    "500000": "{MODSDIR}/src/inst/extdata/HD_ULP_PoN_{genome_build}_500kb_median_normAutosome_median.rds"
+                # must use gc wig file corresponding to same binSize (required)
+                ichorCNA_gcWig:
+                    "1000000": "{MODSDIR}/src/inst/extdata/gc_{genome_build}_1000kb.wig"


Does this genome_build naming match the one we use ? I assume it does since this was run in GAMBL, right?

Yes, unfortunately ichorCNA's github repo is messy and inconsistent with their naming conventions. In my original version, I had to manually rename some of their reference files to fit this format. In the current version, there's one rule with a bunch of symlinks that renames the reference files so it would fit in downstream rules.

rdmorin · 2021-04-28T16:56:54Z

modules/ichorcna/1.0/ichorcna.smk

+
+### Directories ###
+
+resultsDir = "results/"


This isn't a standard for LCR-modules. What is the purpose/benefit of this?

I left them in by accident when I was initially crafting the module. They're not used at all in the module. They've been removed now

rdmorin · 2021-04-28T16:58:02Z

modules/ichorcna/1.0/ichorcna.smk

+CFG["runs"]["binSize"] = str(CFG["options"]["readcounter"]["binSize"])
+
+# Symlinks the input files into the module results directory (under '00-inputs/')
+rule _ichorcna_input_bam:


Does it work on crams?

No, unfortunately readCounter does not work on CRAMs. readCounter (part of bamtools) never updated for CRAMs, though it seemed like a popular idea (pezmaster31/bamtools#149)

rdmorin · 2021-04-28T16:58:25Z

modules/ichorcna/1.0/ichorcna.smk

+rule _run_ichorcna:
+    input:
+        tum = CFG["dirs"]["readDepth"] + "{seq_type}--{genome_build}/{binSize}/{tumour_id}.bin{binSize}.wig",
+        # norm = CFG["dirs"]["readDepth"] + "{seq_type}--{genome_build}/{binSize}/{normal_id}.bin{binSize}.wig"


Delete this and any other extraneous lines

rdmorin · 2021-04-28T16:59:05Z

modules/ichorcna/1.0/ichorcna.smk

+        seg = CFG["dirs"]["seg"] + "{seq_type}--{genome_build}/{binSize}/{tumour_id}--{normal_id}--{pair_status}/{tumour_id}.seg",
+        plot = CFG["dirs"]["seg"] + "{seq_type}--{genome_build}/{binSize}/{tumour_id}--{normal_id}--{pair_status}/{tumour_id}/{tumour_id}_genomeWide.pdf",
+        #rdata = "results/ichorCNA/{sample_id}/{sample_id}.RData"
+    params:


If you name your params identically to the variable name you want to use all of this is redundant because you can use unpacking to set them all (I think)

Not sure I follow here

rdmorin · 2021-04-28T16:59:51Z

modules/ichorcna/1.0/ichorcna.smk

+
+# Perform some clean-up tasks, including storing the module-specific
+# configuration on disk and deleting the `CFG` variable
+op.cleanup_module(CFG)


add newline

Kdreval · 2021-04-28T17:07:19Z

modules/ichorcna/1.0/ichorcna.smk

+        bai = CFG["dirs"]["inputs"] + "bam/{seq_type}--{genome_build}/{sample_id}.bam.bai", # specific to readCounter
+        crai = CFG["dirs"]["inputs"] + "bam/{seq_type}--{genome_build}/{sample_id}.bam.crai" 
+    run:
+        op.relative_symlink(input.bam, output.bam)


we started to use the op.absolute_symlink here instead of relative symlink. This will also need the check for oncopipe version, and you can find example in one of the recent modules (battenberg/1.1, pathseq/1.0 etc)

Changed bam inputs to op.absolute_symlink

Kdreval · 2021-04-28T17:09:36Z

modules/ichorcna/1.0/ichorcna.smk

+        seg = CFG["dirs"]["outputs"] + "{seq_type}--{genome_build}/seg/{binSize}/{tumour_id}--{normal_id}--{pair_status}.seg",
+        plot = CFG["dirs"]["outputs"] + "{seq_type}--{genome_build}/plot/{binSize}/{tumour_id}--{normal_id}--{pair_status}_genomeWide.pdf"
+    run:
+        op.relative_symlink(input.corrDepth, output.corrDepth)


the latest version of oncopipe also supports argument in_module=TRUE, which creates "shallow" symlinks. You can add it here as well, and the recent modules have an example of this

Added for all output relative_symlinks

Kdreval · 2021-04-28T17:23:46Z

modules/ichorcna/1.0/ichorcna.smk

+##### RULES #####
+# ---------------------------------------------------------------------------- #
+
+CFG["runs"]["binSize"] = str(CFG["options"]["readcounter"]["binSize"])


what does this do? I understand that it basically tells rule all the binsize for each sample. If it is the same for all samples, you can in the rule all just use str(CFG["options"]["readcounter"]["binSize"]) and then you can multiply itty the number of samples, for example *len(CFG["runs"]["tumour_sample_id"] so it expands the length for each sample. There is example in the rule all of module liftover

Your interpretation is correct - I modified it to your method from liftOver.

…nloaded into the input directory

Jwong684 · 2021-04-29T09:05:31Z

Modified version has been tested - it runs to completion for a grch37, hg19, hg38 sample.

Since I had reference files I needed to link and modify from the github repo of ichorCNA, I decided to get snakemake to clone their repo into the CFG["dirs"]["inputs"] directory so I could access these files in downstream rules.

The naming convention for ichorCNA's reference files is messy (ex. HD_ULP_PoN_1Mb_median_normAutosome_mapScoreFilteredmedian.rds for hg19, and HD_ULP_PoNhg38_1Mb_median_normAutosome_median.rds for hg38). I originally just renamed these files when I had them as part of the module, and symlinked reference files for related genome_builds (ex. hs37d5, hg19, grch37 would all have symlinks to the same reference file). I incorporated this symlinking step inside the module.

Also added Chris's suggestion of tweaking the default parameters.

Kdreval · 2021-04-29T13:12:19Z

modules/ichorcna/1.0/src/runIchorCNA.R

@@ -0,0 +1,423 @@
+# file:   ichorCNA.R


This file is now downloaded directly, right?

No this one is the one that I had to modify to allow hs37d5 to be included. They hard-coded the available genomes

Kdreval · 2021-04-29T13:14:50Z

modules/ichorcna/1.0/config/default.yaml

+                    hg38: "{MODSDIR}/src/inst/extdata/GRCh38.GCA_000001405.2_centromere_acen.txt"
+                ichorCNA_minMapScore: 0.75
+                ichorCNA_chrs:  
+                    grch37: "c('1', '2', '3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X')"


Can this be moved to snakelike, instead of being in config? There, you can use file listing chromosomes generated by reference files (main_chromosomes.txt) or use function to generate chromosome names (there is example in sage module)

Kdreval · 2021-04-29T13:22:28Z

modules/ichorcna/1.0/config/default.yaml

+                binSize:  1000000 # set window size to compute coverage 
+                # available binSizes are: 1000000, 500000, 50000, 10000
+            run:
+                ichorCNA_libdir: ""


Are all these files in inst are being auto downloaded? They are probably then not needed in the config since you set the naming convention and the path to these. files in the snakelike rule

For the chromosomes, I could use the function for the readCounter step since I need the command to look like:
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y"
but for the ichorCNA step, it needs to be in R-vector format:
"c('1', '2', '3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X')"
Wouldn't it be simpler to keep it like this in the config? Otherwise, I'd need to make a function to include c( ), and ' '

'ichorCNA_libdir:' parameter is supposed to be set to where the github repo resides for ichorCNA (it'll use this to search for inst/extdata/ underneath that directory). I just left it in the config since it might be useful if someone needs to modify the path one day

How is that vector being given to ichorCNA?

In the command it would be --chrs "c('1', '2', '3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X')"

You could construct that in Python probably and then just add that as a "param" in your rule.

modules/ichorcna/1.0/ichorcna.smk

Jwong684 added 6 commits June 28, 2020 21:54

Add initial draft of ichorcna version 1.0

d8ba998

Merge branch 'master' of https://github.com/LCR-BCCRC/lcr-modules int…

7e7dbc7

…o module/ichorcna/1.0

Updated ichorcna config and smk framework

e357f09

Working version of ichorcna

a8f558b

removed the README (unnecessary download instructions)

3cbe645

ichorCNA working module

80edb57

Jwong684 requested review from rdmorin and Kdreval April 28, 2021 16:51

rdmorin requested changes Apr 28, 2021

View reviewed changes

Kdreval reviewed Apr 28, 2021

View reviewed changes

overhaul ichorcna - address all PR; ichorcna and dependencies are dow…

874bb89

…nloaded into the input directory

Kdreval reviewed Apr 29, 2021

View reviewed changes

Jwong684 added 2 commits April 29, 2021 14:00

Added a chr- function for readCounter

0466b04

ichorCNA changed chrom input for R too

ad9c362

Jwong684 requested review from rdmorin and Kdreval May 4, 2021 02:04

rdmorin reviewed May 4, 2021

View reviewed changes

modules/ichorcna/1.0/ichorcna.smk Show resolved Hide resolved

rdmorin approved these changes May 4, 2021

View reviewed changes

rdmorin merged commit b701fff into master May 4, 2021

Module/ichorcna/1.0 #178

Module/ichorcna/1.0 #178

Conversation

Jwong684 commented Apr 1, 2021

Pull Request Checklists

Checklist for New Module

Required

If applicable

Checklist for Updated Module

rdmorin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jwong684 commented Apr 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment