Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #45

Closed
AnqiZhang-HIT opened this issue Sep 23, 2022 · 14 comments
Closed

Segmentation fault (core dumped) #45

AnqiZhang-HIT opened this issue Sep 23, 2022 · 14 comments

Comments

@AnqiZhang-HIT
Copy link

get results ,but return unsuccessful code.
LOG:
Estimation from OptimizeHeter:
Contaminating Sample PC1:-0.025539 PC2:-0.0565987
Intended Sample PC1:-0.0166733 PC2:-0.0297511
FREEMIX(Alpha):0.000110694
NOTICE - Success!
run.sh: line 7: 3722258 Segmentation fault (core dumped) VerifyBamID2 --Reference GRCh38_full_analysis_set_plus_decoy_hla.fa --BamFile test.bam --Output out_prefiex --NumThread 4 --SVDPrefix 1000g.phase3.100k.b38.vcf.gz.dat
MUGQICexitStatus:139

@Griffan
Copy link
Owner

Griffan commented Sep 24, 2022

Could you provide more details about your environment?

@yfarjoun
Copy link
Contributor

Hi @Griffan, I'm also getting this error but in my case it's when I'm trying to create new resource files for vbid2.

My commandline is

verifybamid2 \
--RefVCF resources/CCDG_13607_B01_GRM_WGS_2019-02-19_all.recalibrated_variants.subsetted.vcf.gz \
--Reference references/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.dict &> resources/log/vbid_reference.log 

I get the following error:

   VerifyBamID2: A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
    
     Version:2.0.1
     Copyright (c) 2009-2020 by Hyun Min Kang and Fan Zhang
     This project is licensed under the terms of the MIT license.
    
    The following parameters are available.  Ones with "[]" are in effect:
    
    Available Options
                        Input/Output Files : --BamFile [Empty],
                                             --PileupFile [Empty],
                                             --Reference [references/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa],
                                             --SVDPrefix [Empty],
                                             --Output [result]
                   Model Selection Options : --WithinAncestry,
                                             --DisableSanityCheck, --NumPC [2],
                                             --FixPC [Empty],
                                             --FixAlpha [-1.0e+00],
                                             --KnownAF [Empty], --NumThread [4],
                                             --Seed [12345], --Epsilon [1.0e-08],
                                             --OutputPileup, --Verbose
       Construction of SVD Auxiliary Files : --RefVCF [resources/CCDG_13607_B01_GRM_WGS_2019-02-19_all.recalibrated_variants.subsetted.reheadered.vcf.gz]
                            Pileup Options : --min-BQ [13], --min-MQ [2],
                                             --adjust-MQ [40], --max-depth [8000],
                                             --no-orphans, --incl-flags [1040],
                                             --excl-flags [1796]
                        Deprecated Options : --UDPath [Empty], --MeanPath [Empty],
                                             --BedPath [Empty]
    
    
    NOTICE - Specified --RefVCF reference panel VCF file, doing SVD on the fly...
    NOTICE - This procedure will generate SVD matrices as [RefVCF path].UD and [RefVCF path].mu
    NOTICE - You may specify --SVDPrefix [RefVCF path](or --UDPath [RefVCF path].UD and --MeanPath [RefVCF path].mu) in future use
<SNIP>/verifybamid2: line 33: 49452 Segmentation fault: 11  $DIR/VerifyBamID "$@"


Do you know what could be causing this error? The vcf is not empty, and the header matches that in the vcf. 

You asked the previous poster information about their "environment". Could you clarify what information you are looking for?

@Griffan
Copy link
Owner

Griffan commented May 18, 2024

Hi @yfarjoun , I drafted a PR to print out the crash site under branch "develop_branch_with_backward_cpp" and its PR is:#65

Would you mind to checkout this branch and post the backtrace info as a first step?
If that doesn't help, we may need to exchange a tiny test site so that I can debug locally.

Thanks for reporting this issue!

@yfarjoun
Copy link
Contributor

here's the stacktrace:

#8    Object "VerifyBamID", at 0x1028c6912, in main + 482
#7    Object "VerifyBamID", at 0x1028c314f, in execute(int, char**) + 3183
#6    Object "VerifyBamID", at 0x1028ce465, in SVDcalculator::ProcessRefVCF(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 69
#5    Object "VerifyBamID", at 0x1028cd1b2, in SVDcalculator::ReadVcf(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<std::__1::vector<char, std::__1::allocator<char> >, std::__1::allocator<std::__1::vector<char, std::__1::allocator<char> > > >&, int&, int&) + 4034
#4    Object "VerifyBamID", at 0x102923949, in String::AsInteger() const + 25
#3    Object "VerifyBamID", at 0x102923972, in String::AsInteger(long&) const + 18
#2    Object "libsystem_platform.dylib", at 0x7ff817d55c1c, in _sigtramp + 28
#1    Object "VerifyBamID", at 0x1028c74dd, in backward::SignalHandling::sig_handler(int, __siginfo*, void*) + 13
#0    Object "VerifyBamID", at 0x1028c7546, in backward::SignalHandling::handleSignal(int, __siginfo*, void*) + 70
[1]    67703 segmentation fault  /Users/yossifarjoun/VerifyBamID/bin/VerifyBamID --RefVCF  --Reference 

@Griffan
Copy link
Owner

Griffan commented May 19, 2024

Thanks, @yfarjoun! Could you also post a few lines of the Vcf File? it seems to be related to the GT or PL feilds.

@yfarjoun
Copy link
Contributor

I'm working with the 1000genomes file as input, so there's very little "secret" data here....

I added some fprintf lines and I have the format field that seems to throw it off.

I printed the position at each vcf iteration and the sample index (and ID) in the begining of the sample iteration.
the last position printed prior to the trace was 1228424 and the last sample index was 648 (HG01849)

so I ran zless FILE.vcf.gz | sed -n '/1228424/,$p' | head | cut -f 1-9,649 (adding 1 to the sample index since cut is 1-indexed) and I got:

chr1	1228424	.	C	T	681417	PASS	AC=933;AF=0.187;AN=4992;BaseQRankSum=0.211;ClippingRankSum=0.081;DP=111710;ExcessHet=3.0318;FS=0.521;InbreedingCoeff=-0.0179;MLEAC=952;MLEAF=0.191;MQ=59.76;MQ0=0;MQRankSum=-0.062;NEGATIVE_TRAIN_SITE;POSITIVE_TRAIN_SITE;QD=16.18;ReadPosRankSum=0.54;SOR=0.726;VQSLOD=0.993;culprit=DP	GT:AB:AD:DP:GQ:PGT:PID:PL	0/0:.:46,0:46:54:.:.:0,54,810

I think that both GT and PL look fine.... not sure what seems to be the problem.

@yfarjoun
Copy link
Contributor

yfarjoun commented May 19, 2024

here's the vcf up to the problematic line:

subset.vcf.gz

@yfarjoun
Copy link
Contributor

when I tried just printing phred[0] I got a . so something is off in the parsing of that line.

image

@Griffan
Copy link
Owner

Griffan commented May 19, 2024

This is the problematic sample:"./.:.:42,0:42:.:.:.:.". I will try to fix it in the Debugging mode PR.

@hyunminkang
Copy link
Collaborator

Having missing genotypes may result in inaccurate PCA estimates. I would advise remove variants with many missing genotypes, and fill-in the remaining missing genotypes with best-guess genotypes (or dosages) before calculating PCs, which should be the common practice.

@Griffan
Copy link
Owner

Griffan commented May 20, 2024

Having missing genotypes may result in inaccurate PCA estimates. I would advise remove variants with many missing genotypes, and fill-in the remaining missing genotypes with best-guess genotypes (or dosages) before calculating PCs, which should be the common practice.

I have updated the PR to apply QC filters on each VCF record. #65

@yfarjoun
Copy link
Contributor

Thanks. when I looked for the errant sample using cut I forgot to add 9 to the index to account for the fixed columns... 🤣

@hyunminkang thanks for the reminder. I'm looking for a way to drastically increase the sensitivity of vbid2 and for that I need to use specific SNPs. it's odd that the 1KG snps have many missing genotypes...

if it's only a few samples that are problematic and have many missing genotypes, I'll filter out these samples. If not, In lieu of removing the sites, I could also impute the missing genotypes, does that make sense?

@yfarjoun
Copy link
Contributor

not my issue to close, but my part in this issue is resolved.

@Griffan
Copy link
Owner

Griffan commented May 21, 2024

not my issue to close, but my part in this issue is resolved.

The original issue should be different from the "—RefVCF" one. But the procedure to locate and report the specific crash scene is the same. Should anyone in future also encounter this error, please refer to the "Debugging Mode" section on README.
I will close this issue for now.

@Griffan Griffan closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants