Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow input of already p-shifted reads (bigwig) #102

Open
Roleren opened this issue Dec 15, 2021 · 3 comments
Open

Allow input of already p-shifted reads (bigwig) #102

Roleren opened this issue Dec 15, 2021 · 3 comments

Comments

@Roleren
Copy link

Roleren commented Dec 15, 2021

Is your feature request related to a problem? Please describe.
This tool can not input already shifted data, makes it less useful for the community,
since sometimes large bigwig files can be used, which are not possible to fit in a bam file.

Describe the solution you'd like
optional input for bigwig (ORFquant supports this with very little tinkering already)

Describe alternatives you've considered
One alternative is either to use another tool,
or I can also just hack a solution by creating a fork that skips to the step
where you have the alignments as p sites

Additional context
For very large Ribo-seq files (What would have been a > 1 TB bam file etc)
With bigwig, it can be loaded basically instantly.

@saketkc
Copy link
Collaborator

saketkc commented Dec 15, 2021

Thanks for the suggestions. While I do not have immediate plans to support this functionality, a PR is always welcome.

@Roleren
Copy link
Author

Roleren commented Dec 16, 2021

I have already forked the repo, I think I see how it can be done.

Your final processing before ORF prediction is this:
merge_read_lengths()
It claims to return:

Returns
    -------
    merged_alignments: dict(dict)
                       alignments by merging all lengths

I see the dict is structured like this:

count = alignments[length][strand][(chrom, pos)]
So it is a dict, by length -> strand -> then a double key on chrome and pos.
In the final return, you remove the length key and merge the results of all lengths.
Resulting in this dict structure:
merged_alignments[strand][(chrom, pos_shifted)]

So if I understand correctly, I would need to convert the bigwig (which is split on strand)
For each value in bigwig per chromosome, input the shifted read position, what you call (pos_shifted)

I then could call the main detection function: export_orf_coverages()
using this dict I made from the bigwig.
This is possible, and of course then the metagene plots are skiped, since it is presumed the user
already did this correctly.

I think it sounds good, what do you think? Would you be interested to put it as a separate function in the cli if I make a PR,
so that people could easily call if from directly from bigwig?

Would make your already quite fast tool, finish in no time.

@saketkc
Copy link
Collaborator

saketkc commented Dec 16, 2021

Your understanding is correct. I will just elaborate it below.

  1. Bigwig input mode will require two files: 1) positive strand bigwig 2) negative strand bigwig
  2. The offset can then be set to 0 (because the input bigwigs have already been shifted), this means the method could in principle be called by merge_read_lengths(alignments, psite_offsets = dict([read_length, 0] for read_length in alignments.keys()))

In principle, all that needs to be modified is a helper function that enables bigwig -> alignments dict conversion. Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants