1000 Genomes Project

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 22116 Experts worldwide ranked by ideXlab platform

Qasim Ayub - One of the best experts on this subject based on the ideXlab platform.

  • punctuated bursts in human male demography inferred from 1 244 worldwide y chromosome sequences
    Nature Genetics, 2016
    Co-Authors: David G Poznik, Shane Mccarthy, Fernando L Mendez, Thomas Willems, Andrea Massaia, Melissa Wilson A Sayres, Qasim Ayub, Apurva Narechania, Seva Kashin, Yuan Chen
    Abstract:

    Chris Tyler-Smith, Carlos Bustamante and colleagues report an analysis of 1,244 human Y chromosomes from the 1000 Genomes Project. They find that copy number variants have a higher predicted functional impact than other variant classes and infer bursts of male population expansion corresponding to historical periods of migration and technological innovations.

  • punctuated bursts in human male demography inferred from 1 244 worldwide y chromosome sequences
    Nature Genetics, 2016
    Co-Authors: David G Poznik, Shane Mccarthy, Fernando L Mendez, Thomas Willems, Andrea Massaia, Melissa Wilson A Sayres, Qasim Ayub, Apurva Narechania, Yali Xue, Seva Kashin
    Abstract:

    We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and Projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

  • wide distribution and altitude correlation of an archaic high altitude adaptive epas1 haplotype in the himalayas
    Human Genetics, 2016
    Co-Authors: Sophie Hackinger, Thirsa Kraaijenbrink, Massimo Mezzavilla, George Van Driem, Mark A Jobling, Peter De Knijff, Chris Tylersmith, Qasim Ayub
    Abstract:

    High-altitude adaptation in Tibetans is influenced by introgression of a 32.7-kb haplotype from the Denisovans, an extinct branch of archaic humans, lying within the endothelial PAS domain protein 1 (EPAS1), and has also been reported in Sherpa. We genotyped 19 variants in this genomic region in 1507 Eurasian individuals, including 1188 from Bhutan and Nepal residing at altitudes between 86 and 4550 m above sea level. Derived alleles for five SNPs characterizing the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many populations from the Himalayas, showing a significant correlation with altitude (Spearman’s correlation coefficient = 0.75, p value 3.9 × 10−11). Seven East- and South-Asian 1000 Genomes Project individuals shared the Denisovan haplotype extending beyond the 32-kb region, enabling us to refine the haplotype structure and identify a candidate regulatory variant (rs370299814) that might be interacting in an additive manner with the derived G allele of rs150877473, the variant previously associated with high-altitude adaptation in Tibetans. Denisovan-derived alleles were also observed at frequencies of 3–14 % in the 1000 Genomes Project African samples. The closest African haplotype is, however, separated from the Asian high-altitude haplotype by 22 mutations whereas only three mutations, including rs150877473, separate the Asians from the Denisovan, consistent with distant shared ancestry for African and Asian haplotypes and Denisovan adaptive introgression.

  • wide distribution and altitude correlation of an archaic high altitude adaptive epas1 haplotype in the himalayas
    Human Genetics, 2016
    Co-Authors: Sophie Hackinger, Thirsa Kraaijenbrink, Massimo Mezzavilla, Mark A Jobling, Peter De Knijff, Chris Tylersmith, Yali Xue, George Van Driem, Qasim Ayub
    Abstract:

    High-altitude adaptation in Tibetans is influenced by introgression of a 32.7-kb haplotype from the Denisovans, an extinct branch of archaic humans, lying within the endothelial PAS domain protein 1 (EPAS1), and has also been reported in Sherpa. We genotyped 19 variants in this genomic region in 1507 Eurasian individuals, including 1188 from Bhutan and Nepal residing at altitudes between 86 and 4550 m above sea level. Derived alleles for five SNPs characterizing the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many populations from the Himalayas, showing a significant correlation with altitude (Spearman's correlation coefficient = 0.75, p value 3.9 × 10(-11)). Seven East- and South-Asian 1000 Genomes Project individuals shared the Denisovan haplotype extending beyond the 32-kb region, enabling us to refine the haplotype structure and identify a candidate regulatory variant (rs370299814) that might be interacting in an additive manner with the derived G allele of rs150877473, the variant previously associated with high-altitude adaptation in Tibetans. Denisovan-derived alleles were also observed at frequencies of 3-14% in the 1000 Genomes Project African samples. The closest African haplotype is, however, separated from the Asian high-altitude haplotype by 22 mutations whereas only three mutations, including rs150877473, separate the Asians from the Denisovan, consistent with distant shared ancestry for African and Asian haplotypes and Denisovan adaptive introgression.

Sophie Hackinger - One of the best experts on this subject based on the ideXlab platform.

  • wide distribution and altitude correlation of an archaic high altitude adaptive epas1 haplotype in the himalayas
    Human Genetics, 2016
    Co-Authors: Sophie Hackinger, Thirsa Kraaijenbrink, Massimo Mezzavilla, George Van Driem, Mark A Jobling, Peter De Knijff, Chris Tylersmith, Qasim Ayub
    Abstract:

    High-altitude adaptation in Tibetans is influenced by introgression of a 32.7-kb haplotype from the Denisovans, an extinct branch of archaic humans, lying within the endothelial PAS domain protein 1 (EPAS1), and has also been reported in Sherpa. We genotyped 19 variants in this genomic region in 1507 Eurasian individuals, including 1188 from Bhutan and Nepal residing at altitudes between 86 and 4550 m above sea level. Derived alleles for five SNPs characterizing the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many populations from the Himalayas, showing a significant correlation with altitude (Spearman’s correlation coefficient = 0.75, p value 3.9 × 10−11). Seven East- and South-Asian 1000 Genomes Project individuals shared the Denisovan haplotype extending beyond the 32-kb region, enabling us to refine the haplotype structure and identify a candidate regulatory variant (rs370299814) that might be interacting in an additive manner with the derived G allele of rs150877473, the variant previously associated with high-altitude adaptation in Tibetans. Denisovan-derived alleles were also observed at frequencies of 3–14 % in the 1000 Genomes Project African samples. The closest African haplotype is, however, separated from the Asian high-altitude haplotype by 22 mutations whereas only three mutations, including rs150877473, separate the Asians from the Denisovan, consistent with distant shared ancestry for African and Asian haplotypes and Denisovan adaptive introgression.

  • wide distribution and altitude correlation of an archaic high altitude adaptive epas1 haplotype in the himalayas
    Human Genetics, 2016
    Co-Authors: Sophie Hackinger, Thirsa Kraaijenbrink, Massimo Mezzavilla, Mark A Jobling, Peter De Knijff, Chris Tylersmith, Yali Xue, George Van Driem, Qasim Ayub
    Abstract:

    High-altitude adaptation in Tibetans is influenced by introgression of a 32.7-kb haplotype from the Denisovans, an extinct branch of archaic humans, lying within the endothelial PAS domain protein 1 (EPAS1), and has also been reported in Sherpa. We genotyped 19 variants in this genomic region in 1507 Eurasian individuals, including 1188 from Bhutan and Nepal residing at altitudes between 86 and 4550 m above sea level. Derived alleles for five SNPs characterizing the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many populations from the Himalayas, showing a significant correlation with altitude (Spearman's correlation coefficient = 0.75, p value 3.9 × 10(-11)). Seven East- and South-Asian 1000 Genomes Project individuals shared the Denisovan haplotype extending beyond the 32-kb region, enabling us to refine the haplotype structure and identify a candidate regulatory variant (rs370299814) that might be interacting in an additive manner with the derived G allele of rs150877473, the variant previously associated with high-altitude adaptation in Tibetans. Denisovan-derived alleles were also observed at frequencies of 3-14% in the 1000 Genomes Project African samples. The closest African haplotype is, however, separated from the Asian high-altitude haplotype by 22 mutations whereas only three mutations, including rs150877473, separate the Asians from the Denisovan, consistent with distant shared ancestry for African and Asian haplotypes and Denisovan adaptive introgression.

Alexander E Urban - One of the best experts on this subject based on the ideXlab platform.

  • comprehensive performance comparison of high resolution array platforms for genome wide copy number variation cnv analysis in humans
    BMC Genomics, 2017
    Co-Authors: Rajini R Haraksingh, Alexej Abyzov, Alexander E Urban
    Abstract:

    High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

  • Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans
    'Springer Science and Business Media LLC', 2017
    Co-Authors: Rajini R Haraksingh, Alexej Abyzov, Alexander E Urban
    Abstract:

    Abstract Background High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. Results The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. Conclusions High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies

Zhaoming Wang - One of the best experts on this subject based on the ideXlab platform.

  • meta analysis of genome wide association studies identifies multiple lung cancer susceptibility loci in never smoking asian women
    Human Molecular Genetics, 2016
    Co-Authors: Zhaoming Wang, Wei Jie Seow, Kouya Shiraishi, Chao A Hsiung, Keitaro Matsuo, Jie Liu, Kexin Chen, Taiki Yamji, Yang Yang
    Abstract:

    Genome-wide association studies (GWAS) of lung cancer in Asian never-smoking women have previously identified six susceptibility loci associated with lung cancer risk. To further discover new susceptibility loci, we imputed data from four GWAS of Asian non-smoking female lung cancer (6877 cases and 6277 controls) using the 1000 Genomes Project (Phase 1 Release 3) data as the reference and genotyped additional samples (5878 cases and 7046 controls) for possible replication. In our meta-analysis, three new loci achieved genome-wide significance, marked by single nucleotide polymorphism (SNP) rs7741164 at 6p21.1 (per-allele odds ratio (OR) = 1.17; P = 5.8 × 10(-13)), rs72658409 at 9p21.3 (per-allele OR = 0.77; P = 1.41 × 10(-10)) and rs11610143 at 12q13.13 (per-allele OR = 0.89; P = 4.96 × 10(-9)). These findings identified new genetic susceptibility alleles for lung cancer in never-smoking women in Asia and merit follow-up to understand their biological underpinnings.

  • abstract 942 imputation from the 1000 Genomes Project identifies rare large effect variants of brca2 k3326x and chek2 i157t as risk factors for lung cancer a study from the tricl consortium
    Cancer Research, 2014
    Co-Authors: Maria Teresa Landi, Yufei Wang, James Mckay, Thorunn Rafnar, Zhaoming Wang, Maria Timofeeva, Peter Broderick, Kari Stefansson, Angela Risch, Stephen J Chanock
    Abstract:

    We conducted imputation to The 1000 Genomes Project of genome-wide association studies of lung cancer in populations of European ancestry, with 11,348 cases and 15,861 controls from four large studies, including subjects from 13 countries. As a follow-up, we conducted in-silico replication in two studies of 2,303 cases and 27,350 controls and directly genotyped an additional 7,943 cases and 10,945 controls from 14 countries. Data were imputed for all scans for over 10 million SNPs using data from The 1000 Genomes Project (Phase 1 integrated release 3, March 2012) as reference, using IMPUTE2, MaCH or minimac software. Tests of association between imputed SNPs and lung cancer were performed under a probabilistic dosage model in SNPTEST, ProbABEL, MaCH2dat or glm function in R. The fidelity of imputation as assessed by the correlation between imputed and directly typed SNPs was examined in a subset of samples from the four studies used for discovery and showed squared correlation coefficients ranging from 0.74 for the rare CHEK2 variant to 1.00 for the more common TP63 variant. The association between each SNP and lung cancer risk was assessed by the Cochran-Armitage trend test. Principle components generated using common SNPs were used to account for the possibility of inflation. Odds ratios (ORs) and associated 95% confidence intervals (CIs) were calculated by unconditional logistic regression. Meta-analysis was conducted using an inverse-variance approach. Cochran9s Q-statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated. We identified large-effect genome-wide associations for squamous lung cancer with the rare variants of BRCA2-K3326X (rs11571833; odds ratio [OR]=2.47, P=4.74×10 −20 ) and of CHEK2-I157T (rs17879961; OR=0.38 P=1.27×10 −13 ). We also showed an association between common variation at 3q28 (TP63; rs13314271; OR=1.13, P=7.22×10 −10 ) and lung adenocarcinoma previously only reported in Asians. There was no association between these loci and smoking quantity as measured by number of cigarettes smoked per day, using smoking information on 43,693 Icelandic subjects. These findings provide further evidence for inherited genetic susceptibility to lung cancer and its biological basis. Additionally, our analysis demonstrates that imputation can identify rare disease-causing variants having substantive effects on cancer risk from pre-existing GWAS data. Citation Format: Maria Teresa Landi, Yufei Wang, James D. Mckay, Thorunn Rafnar, Zhaoming Wang, Maria Timofeeva, Peter Broderick, Kari Stefansson, Angela Risch, Stephen J. Chanock, David C. Christiani, Rayjean J. Hung, Paul Brennan, Richard S. Houlston, Christopher I. Amos. Imputation from The 1000 Genomes Project identifies rare large effect variants of BRCA2-K3326X and CHEK2-I157T as risk factors for lung cancer; a study from the TRICL consortium. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 942. doi:10.1158/1538-7445.AM2014-942

  • rare variants of large effect in brca2 and chek2 affect risk of lung cancer
    Nature Genetics, 2014
    Co-Authors: Yufei Wang, James Mckay, Thorunn Rafnar, Zhaoming Wang, Maria Timofeeva, Peter Broderick, Xuchen Zong, Marina Laplana, Yongyue Wei, Younghun Han
    Abstract:

    We conducted imputation to the 1000 Genomes Project of four genome-wide association studies of lung cancer in populations of European ancestry (11,348 cases and 15,861 controls) and genotyped an additional 10,246 cases and 38,295 controls for follow-up. We identified large-effect genome-wide associations for squamous lung cancer with the rare variants BRCA2 p.Lys3326X (rs11571833, odds ratio (OR) = 2.47, P = 4.74 x 10(-20)) and CHEK2 p.Ile157Thr (rs17879961, OR = 0.38, P = 1.27 x 10(-13)). We also showed an association between common variation at 3q28 (TP63, rs13314271, OR = 1.13, P = 7.22 x 10(-10)) and lung adenocarcinoma that had been previously reported only in Asians. These findings provide further evidence for inherited genetic susceptibility to lung cancer and its biological basis. Additionally, our analysis demonstrates that imputation can identify rare disease-causing variants with substantive effects on cancer risk from preexisting genome-wide association study data.

Alexej Abyzov - One of the best experts on this subject based on the ideXlab platform.

  • comprehensive performance comparison of high resolution array platforms for genome wide copy number variation cnv analysis in humans
    BMC Genomics, 2017
    Co-Authors: Rajini R Haraksingh, Alexej Abyzov, Alexander E Urban
    Abstract:

    High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

  • Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans
    'Springer Science and Business Media LLC', 2017
    Co-Authors: Rajini R Haraksingh, Alexej Abyzov, Alexander E Urban
    Abstract:

    Abstract Background High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. Results The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4–489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0–86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. Conclusions High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies

  • pemer a computational framework with simulation based error models for inferring genomic structural variants from massive paired end sequencing data
    Genome Biology, 2009
    Co-Authors: Jan O Korbel, Zhengdong D Zhang, Michael Snyder, Alexej Abyzov, Nicholas Carriero, Philip Cayting, Mark Gerstein
    Abstract:

    Personal-genomics endeavors, such as the 1000 Genomes Project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.