Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genotype outputs all sites #142

Open
lihaicheng7003 opened this issue Nov 14, 2023 · 4 comments
Open

Genotype outputs all sites #142

lihaicheng7003 opened this issue Nov 14, 2023 · 4 comments

Comments

@lihaicheng7003
Copy link

Can graphtyper output the genotyping of all sites in a region, such as mitochondrial 16569bp?

./graphtyper genotype ~/data1/References/human/Homo_sapiens_assembly38.fasta --sam=~/data1/project10/subdata/1.markdup.bam --region=chrM --output=~/data1/project19/results_chrM --vcf=output_chrM.vcf.modif.gz --force_no_filter_zero_qual

I tried this command but the resulting vcf file does not have any site information in it.
output_chrM.vcf.modif.gz like:

##fileformat=VCFv4.2
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chrM    1       .       G       G       0       .       .
chrM    2       .       A       A       0       .       .
@lihaicheng7003
Copy link
Author

I found the problem, REF and ALT in --vcf cannot be the same, graphtyper will ignore sites where REF and ALT are the same.
I manually made a vcf file for --vcf, so that all sites can be output

POS	REF	ALT	#CHROM	ID	QUAL	FILTER	INFO
1	G	A	chrM	.	0	.	.
1	G	T	chrM	.	0	.	.
1	G	C	chrM	.	0	.	.
2	A	G	chrM	.	0	.	.
2	A	T	chrM	.	0	.	.
2	A	C	chrM	.	0	.	.

For others’ reference

@hannespetur
Copy link
Member

Hey, this will create a graph where every read maps at every position perfectly. I guess you want to get read counts for each base? I'd suggest using samtools mpileup.

Best,
Hannes

@lihaicheng7003
Copy link
Author

Thanks for the reply, I am testing this software

@lihaicheng7003
Copy link
Author

When using default parameters for both tools, GraphTyper calls out approximately 19,000 variants with only NA12878 (1000 Genomes) as input, while GATK calls out over 40,000. Is this normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants