Skip to content
Kimmo Palin edited this page Apr 29, 2014 · 11 revisions

Welcome to the SLRP wiki!

SLRP is a software for long range phasing and IBD detection in inbread and/or isolated populations genotyped on dense SNP arrays. The methodology was described in Palin et.al. 2011. The master branch of this repository contains a further developed version of the software with better user interface (VCF files) and various tricks for phasing thousands of samples (fastPreProc, slice_length, procs, float, IBDcoverLimit options)

If you need help running or installing the software, don't hesitate contact me, Kimmo Palin, at firstname.lastname at sanger.ac.uk

If you use the software in scientific publications, please cite:

Palin, K., Campbell, H., Wright, A. F., Wilson, J. F. and Durbin, R. (2011), Identity-by-descent-based phasing and imputation in founder populations using graphical models. Genetic Epidemiology. doi: 10.1002/gepi.20635

Installation

The installation seems to be as trivial as:

pip install --user --egg git+https://github.com/kpalin/SLRP.git

Information about output files

The output VCF files from SLRP has a bit of annotation for each genotype that are not described elsewhere but might be useful for many users. Here is some extra description for the various FORMAT fields.

  • GP Genotype posterior probabilities. These are calculated from the diplotype probabilities in the SLRP model, and the heterozygous probability is a sum of the two diplotypes representing two alternative phases.
  • HQ Haplotype Quality. Haplotype qualities. For i=1,2 -10log(Probability that i:th allele is wrong)
  • GQ Genotype Quality. Phred scaled probability of wrong genotype call, calculated again from the diplotypes, summing over phase.

Aknowledgements

Hoai Tuong Nguyen and Anne-Louise Leutenegger from INSERM have been very helpfull in testing the package.

To Do

Here are few things of how to improve SLRP:

  • Calculate marginals (forward backward, sum-product), instead of max-marginals (viterbi, max-product) It might even be faster. Scale the values on each node to sum to one and avoid branching like plaque.