Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlml "error: chrom or pos mismatch in both files" #135

Open
christianwake opened this issue Sep 12, 2023 · 3 comments
Open

mlml "error: chrom or pos mismatch in both files" #135

christianwake opened this issue Sep 12, 2023 · 3 comments

Comments

@christianwake
Copy link

While using dnmtools (v 1.2.4 by command line) mlml with mock bisulfite and oxidated bisulfite data. Some, but not all, samples yielded this error:
error: chrom or pos mismatch in both files

The input files were generated from dnmtools counts from bam files from Bismark alignment.

I found that the reference had some small contig "chromosomes" one of which apparently had at least 1 read for one library time but not the other, causing this mismatch in the chromosomes represented in the two .meth files.

The error was resolved when I filtered out rows in the .meth file from that contig with awk, but it would be better if mlml solved this case.

@andrewdavidsmith
Copy link
Collaborator

@christianwake I think we can make this more convenient. My preference would be to work harder to ensure that the sites included in the counts files depend on the reference genome, and not the mapped reads files. And in addition have an option for mlml that could -force it to ignore a discrepancy if the user were confident the two files are from the same reference and have a consistent sorted order. I think we've made some progress in the former, but it's not yet in a release.

@andrewdavidsmith
Copy link
Collaborator

Relevant here: PR #118 "counts: reporting chromosomes without mapped reads" So using dnmtools counts from the source repo since about 2 weeks ago would have hopefully avoided the issue.

@andrewdavidsmith
Copy link
Collaborator

@christianwake This is probably solved for most users in dnmtools v1.4.0 because it will ensure that the .counts input files have all chromosomes from the reference genome, even if no read maps to them. I'm not closing this yet because I'm considering updating mlml to make it robust to missing chromosomes, but it's tricky since mlml can take 3 inputs, so distinguishing inconsistent sorting vs. missing chroms is more involved. In case you are using Conda, we plan to release 1.4.1 on Conda this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants