Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact sorting of the output bed file? #173

Open
Dario-Galanti opened this issue Mar 30, 2022 · 2 comments
Open

Exact sorting of the output bed file? #173

Dario-Galanti opened this issue Mar 30, 2022 · 2 comments

Comments

@Dario-Galanti
Copy link

When running mosdepth with a bed input file of regions, the regions get sorted in the output file.
What is the exact sorting command used?
I would like to reproduce the exact same sorting (or repress any sorting) so that I can paste extra columns (after the 4th one) which are in the input bed file but not reported in the output.
Maybe a classic first column alphabetic and second numeric sorting?
sort -k1,1 -k2,2n

Thank you very much for any help

@brentp
Copy link
Owner

brentp commented Mar 30, 2022

Yes, sorted by start numerically with each chromosome. https://github.com/brentp/mosdepth/blob/master/mosdepth.nim#L342

Then sorted by the order of chromosomes in the sam header (with SN) so you'd have to use that information to do the sorting.
If you have the fasta and fai used to create the bam file, then you can use gsort like:

gsort $bed $fasta.fai > $sorted_bed

@Dario-Galanti
Copy link
Author

Dario-Galanti commented Mar 31, 2022

Very useful, thanks very much!
I didn't think I could resort the mosdepth output and check the md5 hash.
In my case I could reproduce the sorting with sort -k1,1V -k2,2n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants