Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get variant information from Impute2File #3

Open
legaultmarc opened this issue Jan 19, 2015 · 4 comments
Open

Get variant information from Impute2File #3

legaultmarc opened this issue Jan 19, 2015 · 4 comments

Comments

@legaultmarc
Copy link
Owner

It would be nice to have a fast function to retrieve only the information on variants from an Impute2File.

@legaultmarc
Copy link
Owner Author

If Issue #2 is implemented first, we could directly generate Variant objects with the proper reference allele instead of relying on the "major" and "minor" alleles. Also, this method should not be able to return the MAF as computing it requires the dosage values.

Perhaps we could create two methods .get_variant_information and .as_variant_list that would return a pandas dataframe and a list of Variant objects, respectively.

@legaultmarc
Copy link
Owner Author

What do you think about this @lemieuxl ?

@lemieuxl
Copy link
Collaborator

The two methods seem appropriate.

The .get_variant_information() could return a DataFrame containing the following columns:

  1. name (as index?)
  2. chrom
  3. pos
  4. ref (as encoded in the impute2 file)
  5. alt (as encoded in the impute2 file)

The .get_variant_list() could return a simple list of variants (either snp or indel).

For the dosage function, it is important to keep major and minor, since the dosage was computed so that a value of 2 represents homozygous of the rare allele (and not homozygous of the alternative allele, since the reference allele is not always the most common one in the population).

@legaultmarc
Copy link
Owner Author

I don't really agree with having the name as index because we can expect
a lot of missing values. Also it could be inconsistent with respect to
dbSNP builds or stuff like that.

On 27 January 2015 at 13:01, Louis-Philippe Lemieux Perreault <
notifications@github.com> wrote:

The two methods seem appropriate.

The .get_variant_information() could return a DataFrame containing the
following columns:

  1. name (as index?)
  2. chrom
  3. pos
  4. ref (as encoded in the impute2 file)
  5. alt (as encoded in the impute2 file)

The .get_variant_list() could return a simple list of variants (either
snp or indel).

For the dosage function, it is important to keep major and minor, since
the dosage was computed so that a value of 2 represents homozygous of the
rare allele (and not homozygous of the alternative allele, since the
reference allele is not always the most common one in the population).


Reply to this email directly or view it on GitHub
#3 (comment).

Marc-André Legault
Github (pour du code) http://github.com/legaultmarc
ATGCIO (pour des mots) http://www.atgcio.blogspot.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants