Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to include/switch reference data and a global preview table #1

Open
skanwal opened this issue Nov 8, 2018 · 4 comments
Open

Comments

@skanwal
Copy link

skanwal commented Nov 8, 2018

Hi Sigve

Thanks for another awesome framework. We (@umccr) are very much interested to incorporate this into our reporting.

I have looked at the github repo/code and tested it locally - it works great. We have a couple of questions/comments:

  1. Would it be possible to feed our own reference (bed) files?

We are interested in using some of the reference data from Hartwig. Looking at the codebase, it shouldn't be a problem as the data directory is passed as an argument and this directory contains reference data to be used for the analysis. However, this might impact the annotations that the framework reads in from the .tsv(s) in `cacao_utils.R for specific clinical genomic tracks?

There is an optional flag --target, which according to my understanding refers to the targeted region in the input sample?

  1. Would it make sense to have one global table (that checks coverage for specific genes), stratified by callability - instead of having to go through multiple tracks?

This probably links back to point 1 i.e. feeding in one specific bed track (in this case) which could be joint set of various loci sources such as CIViC, CGI and OncoKb and then reading in the (optional annotations as in the code base) for this data - if this idea aligns well with your original idea of the framework?

  1. It would be useful if we could include an option to limit hereditary cancer - pathogenic loci table to cancer predisposition genes that is also used/referenced here https://github.com/sigven/cpsr?

Sorry about the long commentary and thanks for your time.

Cheers,
Sehrish

@sigven
Copy link
Owner

sigven commented Nov 9, 2018

Dear Sehrish,

Thanks a lot for your input, highly valuable! Generally, I can say that what you suggest makes perfect sense as a further development of the workflow. And parts of your ideas have been mentioned by some other colleagues here. I will get back to you shortly with my ideas/comments on what is realistic short term etc., very busy here today.

PS. You are correct about the --target, this should refer to the targeted region of the input sample. But I have in fact not implemented this one yet, so it is currently only there as a placeholder. Will update that shortly.

regards,
Sigve

@skanwal
Copy link
Author

skanwal commented Nov 12, 2018

Hi Sigve,

Thanks for the response and I look forward to hearing back from you.
Happy to coordinate/contribute always.

Regards,
Sehrish

@sigven
Copy link
Owner

sigven commented Nov 20, 2018

Hi Sehrish,

Coming back to this:

  • What is the reference data from Hartwig? Although passing your own reference data my be quite challenging to process on the fly, I need to get some overview of how it looks in order to evaluate this
  • Regarding the global preview: Seems you are you here thinking of a coverage pr. gene, which I understand is useful. The intention of CACAO was to primarily investigate coverage at the variant loci (pathogenic germline variants, somatic hotspots etc.), but I see this point. But then we should also have an idea of what we consider as the "gene"; the coding sequence only? or all genic sequence (introns, UTRs etc).

Would appreciate your input on this.

regards,
Sigve

@skanwal
Copy link
Author

skanwal commented Nov 21, 2018

Hi Sigve,

Thanks for getting back to this.

• Reference data from Hartwig is:

  • Point mutations from CIViC - Cacao, I understand, uses CIViC to calculate callability for actionable somatic variants
  • Somatic variants from CGI
  • Oncogenic/likely oncogenic variants from OnkoKB (I know you might be cringing on this considering the apprehensions around licensing). I’ll discuss this with Oliver and team to find if we really want to use this.

Also, I do appreciate the point that we need to understand what data we are going to use for presentation as it’s hard processing reference input on the fly.

• Global preview: We are hoping to begin with focussing on coding regions.

It would be definitely very useful to have the ability to switch to whole gene (including introns). But we can expand on this later.

Happy to have your feedback on this and start looking into implementation as well - if this sounds feasible/useful to you.

Regards,
Sehrish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants