Allow input of already p-shifted reads (bigwig) #102

Roleren · 2021-12-15T22:54:05Z

Is your feature request related to a problem? Please describe.
This tool can not input already shifted data, makes it less useful for the community,
since sometimes large bigwig files can be used, which are not possible to fit in a bam file.

Describe the solution you'd like
optional input for bigwig (ORFquant supports this with very little tinkering already)

Describe alternatives you've considered
One alternative is either to use another tool,
or I can also just hack a solution by creating a fork that skips to the step
where you have the alignments as p sites

Additional context
For very large Ribo-seq files (What would have been a > 1 TB bam file etc)
With bigwig, it can be loaded basically instantly.

saketkc · 2021-12-15T22:59:49Z

Thanks for the suggestions. While I do not have immediate plans to support this functionality, a PR is always welcome.

Roleren · 2021-12-16T10:27:00Z

I have already forked the repo, I think I see how it can be done.

Your final processing before ORF prediction is this:
merge_read_lengths()
It claims to return:

Returns
    -------
    merged_alignments: dict(dict)
                       alignments by merging all lengths

I see the dict is structured like this:

count = alignments[length][strand][(chrom, pos)]
So it is a dict, by length -> strand -> then a double key on chrome and pos.
In the final return, you remove the length key and merge the results of all lengths.
Resulting in this dict structure:
merged_alignments[strand][(chrom, pos_shifted)]

So if I understand correctly, I would need to convert the bigwig (which is split on strand)
For each value in bigwig per chromosome, input the shifted read position, what you call (pos_shifted)

I then could call the main detection function: export_orf_coverages()
using this dict I made from the bigwig.
This is possible, and of course then the metagene plots are skiped, since it is presumed the user
already did this correctly.

I think it sounds good, what do you think? Would you be interested to put it as a separate function in the cli if I make a PR,
so that people could easily call if from directly from bigwig?

Would make your already quite fast tool, finish in no time.

saketkc · 2021-12-16T16:29:06Z

Your understanding is correct. I will just elaborate it below.

Bigwig input mode will require two files: 1) positive strand bigwig 2) negative strand bigwig
The offset can then be set to 0 (because the input bigwigs have already been shifted), this means the method could in principle be called by merge_read_lengths(alignments, psite_offsets = dict([read_length, 0] for read_length in alignments.keys()))

In principle, all that needs to be modified is a helper function that enables bigwig -> alignments dict conversion. Does this make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow input of already p-shifted reads (bigwig) #102

Allow input of already p-shifted reads (bigwig) #102

Roleren commented Dec 15, 2021

saketkc commented Dec 15, 2021

Roleren commented Dec 16, 2021

saketkc commented Dec 16, 2021

Allow input of already p-shifted reads (bigwig) #102

Allow input of already p-shifted reads (bigwig) #102

Comments

Roleren commented Dec 15, 2021

saketkc commented Dec 15, 2021

Roleren commented Dec 16, 2021

saketkc commented Dec 16, 2021