Skip to content

Command line options

kpalin edited this page Nov 3, 2011 · 2 revisions

Command line options for SLRP

Systematic Long Range Phasing for phasing and imputing genotypes from isolated founder populations or other populations where each individual has at least one other individual sharing each of his/her chromosomes. The software is command line based and will read various input formats and output to few other. I recommend using VCF as the input and output format, as it seems to be most versatile and best defined. The FAD format is old and should not be used but its specification can be found here

The list of options below is quite long, but to get started, you only need "--vcfFile" and "--geneticMap" that will define your input genotypes and the genetic map of the chromosome.

Usage: SLRP [options]

Options
--version             show program's version number and exit
-h, --help            show this help message and exit

Input options: Options defining files defining the input. All options will take gzip compressed files

-f FILE, --fadFile=FILE
                    Name of the genotype file in FAD format.  [default:[]]
-v FILE, --vcfFile=FILE
                    Name of the VCF file with genotypes.
-t FILE, --tpedFile=FILE
                    Name of the genotype file in tped fromat with
                    genotypes coded as A,C,G,T.  [default:[]]
-C CHROM, --chrom=CHROM
                    Use this chromosome (for tped input, this must be
                    1-22,X,Y,XY or MT)  [default:none]
-m FILE, --geneticMap=FILE
                    Input file with the genetic map in hapmap format
                    [default:none]
--loadLikelihoods=FILE
                    Load initial likelihoods from a file (Probably a quite
                    insensible thing to do) [default:none]
-i FILE, --ibdFile=FILE
                    Name of the input IBD file  [default:none]
-D FILE, --densePanel=FILE
                    File listing the individuals in the 'dense' genotype
                    panel. This is the set of individuals used as source
                    of genotype imputation or target of genotype
                    improvement (or phasing with prephased non-dense
                    haplotypes). Note that white spaces in the individual
                    names will be replaced with ':'[default:none] IMPUTATION
                    TO A DENSE PANEL IS CURRENTLY UNTESTED AND LIKELY NOT TO WORK!
-l NUM, --scoreLim=NUM
                    Minimum score above which to use the IBD matches
                    loaded from a file [default:0.0]
-R FILE, --freq=FILE
                    File to read the allele frequencies from.
                    [default:none]
--famFile=FILE      Name of the file listing the sample names in the FAD
                    file.  [default:none]

Output options: Options defining files to store the output. Most options will gzip compress files with .gz ending

-o FILE, --outFile=FILE
                    Name of the long range phased FAD output file.
                    Depreciated for difficulties of allele coding.
                    Recomend using --freq with this  [default:none]
-O FILE, --outVCF=FILE
                    Name of the long range phased VCF output file
                    [default:SLRP_12074.vcf]
--IMPUTEfile=FILE   Output the imputed results to FILE in IMPUTE format
                    [default:none]
-S FILE, --ibdSegCalls=FILE
                    Output IBD segments between the indivdiduals to FILE
                    [default:none]
-Q FILE, --outQualities=FILE
                    Output quality scores for each site. The scores are
                    negative base 10 log posterior probability ratios
                    between the two phasings. (Makes no sense for homs)
                    [default:none]
-c FILE, --ibdCover=FILE
                    Output the number of individuals sharing a trackt IBD.
                    [default:none]
-W FILE, --writeFreq=FILE
                    File to write the alele frequency estimates to
                    [default:none]
-L FILE, --likeFile=FILE
                    Name of the file for final MAP values as negative
                    natural logarithm  [default:none]

Model parameter options: Options specifying the model parameters that have effect on the results.

--ExpectedIBS=NUM   Expected length of an non IBD IBS segment (in
                    centiMorgans) [default:1.0]
--ExpectedIBD=NUM   Expected length of an IBD segment (in centiMorgans)
                    [default:10.0]
-p NUM, --prob_ibd=NUM
                    Probability of two haplotypes in the population to
                    match  [default:none]
--IBDtransLimit=NUM
                    Upper limit for probability of noIBD to IBD transition
                    between two markers.  Default taken from population
                    probabililties. [default: 4 * prob_ibd]
-T NUM, --CallThreshold=NUM
                    Minimum fold difference in posterior probability of
                    two most probable phases to be called. If given, takes
                    precedence over phredThreshold.  [default:none]
-P NUM, --phredThreshold=NUM
                    Minimum phred scaled quality score to call phases.
                    [default:10.0]
-e NUM, --genotypingErrRate=NUM
                    Estimated genotyping error rate  [default:0.001]
--outputInferredGenotypes
                    Output haplotypes that might have different genotypes
                    from input. These might be more accurate than called
                    from the genotyping assay.  [default:False]

Computational complexity options: Options mostly affecting computational complexity of the process. These should only be adjusted if the program takes too much time or memory. The effect of these options on the quality of the output should be low, but your mileage may vary.

--float             Use floats, instead of doubles. Saves half the memory
                    but might affect numerical precision and stability.
                    [default:False]
-n NUM, --procs=NUM
                    Number of processors to use in parallel  [default:1]
--slice_length=NUM  Length of alignment slice in markers (to save memory)
                    [default:-1]
-F, --fastPreProc   Do pre-processing (putative IBD segment finding) with
                    fast sweepline method, which disregards prior phase
                    information and IBD-noIBD transition probabilities
                    [default:False]
--IBDcoverLimit=NUM
                    Soft lower limit for number of IBD sharing. Only do
                    message passing on the longest NUM segments
                    covering a locus. The selection is greedy, hence there
                    might be more than minimum number of segments used.
                    For large number of individuals, 10 might be a good
                    value. Non positive value turn off this limit.
                    [default: 15]
--minIBDlength=NUM  Hard lower limit for length of IBD segment in markers.
                    This should speed up computation by disregarding
                    uninformative IBS segments. [default: 10]
-I NUM, --iterations=NUM
                    Maximum number of iterations to run the message
                    passing. [default:30]
-d NUM, --damping=NUM
                    Damping factor for message updates  [default:0.75]

Untouchable options: These options are no use to regular user and should not be touched. They are likely to crash the program or give completely wrong results. YOU HAVE BEEN WARNED!

--verbose           Output some more diagnostics  [default:False]
--seed=NUM          Seed for random number generator  [default:none]
--use_sum_product   Use sum-product (marginal posteriors) instead max-
                    product (maximum a posteriori) algorithm.
--test              Activate some testing thingies  [default:False]
--intermediate      Save intermediate FAD and IBD files  [default:False]
--mpi               Use MPI to distribute the computational and memory
                    load  [default:False]