Skip to content

genetics-statistics/gemma2lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GEMMA2: Genome-wide efficient ‘exact’ mixed-model analysis

Table of Contents

GEMMA2/lib

Here we create a new version of GEMMA that acts like a toolbox for GWA, mapping and inference.

GEMMA: Genome-wide efficient ‘exact’ mixed-model analysis for association studies that correct that accounts for population stratification and sample structure.

In this source code repository we work on the next generation of GEMMA and LMM related software with the goal of creating a library of functionality which can be used from languages such as R and Python.

GEMMA2/lib is therefore meant to be a library written in Rust, C and D. The front-end is initially written in Python. R and Packet FFIs may follow.

On our http://genenetwork.org/ systems we run GEMMA every day. As part of a Systems Genetics and Precision Medicine Project we are targetting GEMMA2/lib to be a faster and more flexible tool.

NOTICE: this software is under active development. YMMV.

For more information contact Pjotr Prins. A BLOG about the project can be found here.

INSTALL

See ./INSTALL.org.

RUNNING GEMMA2

GEMMA is called from the command line using the gemma2 command. Try

gemma2 --help

for a list of commands.

gemma version 1 pass through

GEMMA2 differs from GEMMA1 but adds a compatibility layer interpreting GEMMA1 switches. To do a pass-through to GEMMA1 simply use the gemma1 command:

gemma2 gemma1 [old switches]

To try the examples:

# compute Kinship matrix
gemma2 gemma1 -g ./example/mouse_hs1940.geno.txt.gz -p ./example/mouse_hs1940.pheno.txt \
    -gk -o mouse_hs1940
# run univariate LMM
gemma2 gemma1 -g ./example/mouse_hs1940.geno.txt.gz \
    -p ./example/mouse_hs1940.pheno.txt -n 1 -a ./example/mouse_hs1940.anno.txt \
    -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8_lmm

You can set the gemma1 binary with the –bin switch or GEMMA1_BIN environment variable.

Verbosity and logging

gemma2 is always invoked with a command (e.g. filter, grm and lmm) followed by specific switches. Before command a number of generic information switches can be used

gemma2 [-vv] [--log INFO] [--debug] command [specific switches]

Where repeated -v switches increase verbosity. The --log switch shows the log level (DEBUG|INFO|WARNING|ERROR) which is set to WARNING by default. The --debug switch puts gemma2 in debug mode.

Commands

convert command

The convert command can convert from plink and BIMBAM formats to R/qtl2 format. GEMMA2, unlike GEMMA1, uses a R/qtl2 based (from now on GEMMA2) format where genotypes and phenotypes are stored in a ‘tidy’ format and metadata is represented in YAML/JSON.

If you want a quick preview use the --debug switch:

gemma2 -vv --log INFO --debug ALL convert --plink example/mouse_hs1940
cat mouse_hs1940.json
{
    "description": "mouse_hs1940",
    "crosstype": "hs",
    "sep": "\t",
    "na.strings": [
        "-",
        "NA"
    ],
    "comment.char": "#",
    "individuals": 1940,
    "markers": 12226,
    "phenotypes": 7,
    "geno": "mouse_hs1940_geno.tsv.gz",
    "pheno": "mouse_hs1940_pheno.tsv",
    "alleles": [
        "A",
        "B",
        "H"
    ],
    "genotypes": {
        "A": 1,
        "H": 2,
        "B": 3
    },
    "geno_sep": false,
    "geno_transposed": true
}

Note that this format has no concept of minor/major allele encoding as is used in plink and BIMBAM formats.

Convert (back) to BIMBIM

GEMMA2 can write BIMBAM from GEMMA2 (R/qtl2) format using the export function and a control file. E.g.

gemma2 -vv --log INFO --debug ALL export --bimbam -c control

filter command

grm command

Using the control file generated from convert:

gemma2 --debug --log INFO -vv grm -c mouse_hs1940.json

lmm command

LICENSE

GEMMA and GEMMA2/lib are published under the GPLv3 LICENSE.

Releases

No releases published

Packages

No packages published

Languages