Skip to content

A pipeline utilizing PCA on 1000 genomes and WGS data from your own samples to determine or validate ancestry of an individual.

Notifications You must be signed in to change notification settings

laura-budurlean/PCA-Ethnicity-Determination-from-WGS-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PCA-Ethnicity-Determination-from-WGS-Data

A pipeline utilizing 1000 Genomes data and WGS data from your own samples to determine or validate ethnicity of an individual.

The goal of this pipeline is to determine ancestry of an individual using sequencing data (SNPs) starting with hg38 variant called files (VCF) from those individuals. The cohort data is then combined/overlayed with 1000 Genomes data and PCA analysis is performed. PCA scores are then plotted along with 1000 genomes data to provide a visual representation of where each individual falls on the overall PCA plot of ancestry.

Some requirements for this pipeline:

Instructions:

  1. Perform the steps outlined in the bash script 1-determine-ancestry-by-PCA
  2. In R, perform the steps outlined in 2-plot.R

The output of this ancestry calling pipeline will give you a plot with 1000 genomes super populations and your own samples overlayed on top of the super population they most closely resemble based on the SNV data.

example_PCA_for_github

Releases

No releases published

Packages

No packages published