Relationship Matrix

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 94005 Experts worldwide ranked by ideXlab platform

I. Misztal - One of the best experts on this subject based on the ideXlab platform.

  • core dependent changes in genomic predictions using the algorithm for proven and young in single step genomic best linear unbiased prediction
    Journal of Animal Science, 2020
    Co-Authors: I. Misztal, S Tsuruta, Ivan Pocrnic, Daniela Lourenco
    Abstract:

    Single-step genomic best linear unbiased prediction with the algorithm for proven and young (APY) is a popular method for large-scale genomic evaluations. With the APY algorithm, animals are designated as core or noncore, and the computing resources to create the inverse of the genomic Relationship Matrix (GRM) are reduced by inverting only a portion of that Matrix for core animals. However, using different core sets of the same size causes fluctuations in genomic estimated breeding values (GEBV) up to one additive standard deviation without affecting prediction accuracy. About 2% of the variation in the genomic Relationship Matrix is noise. In the recursion formula for APY, the error term modeling the noise is different for every set of core animals, creating changes in breeding values. While average changes are small, and correlations between breeding values estimated with different core animals are close to 1.0, based on the normal distribution theory, outliers can be several times bigger than the average. Tests included commercial datasets from beef and dairy cattle and from pigs. Beyond a certain number of core animals, the prediction accuracy did not improve, but fluctuations decreased with more animals. Fluctuations were much smaller than the possible changes based on prediction error variance. GEBV change over time even for animals with no new data as genomic Relationships ties all the genotyped animals, causing reranking of top animals. In contrast, changes in nongenomic models without new data are small. Also, GEBV can change due to details in the model, such as redefinition of contemporary groups or unknown parent groups. In particular, increasing the fraction of blending of the genomic Relationship Matrix with a pedigree Relationship Matrix from 5% to 20% caused changes in GEBV up to 0.45 SD, with correlation of GEBV > 0.99. Fluctuations in genomic predictions are part of genomic evaluation models and are also present without the APY algorithm when genomic evaluations are computed with updated data. The best approach to reduce the impact of fluctuations in genomic evaluations is to make selection decisions not on individual animals with limited individual accuracy but on groups of animals with high average accuracy.

  • metafounders are related to f st fixation indices and reduce bias in single step genomic evaluations
    Genetics Selection Evolution, 2017
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    Metafounders are pseudo-individuals that encapsulate genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses the estimation and usefulness of metafounder Relationships in single-step genomic best linear unbiased prediction (ssGBLUP). We show that ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, such as $$F_{\text{st}}$$ fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals and pedigree. Simple methods for their estimation include naive computation of allele frequencies from marker genotypes or a method of moments that equates average pedigree-based and marker-based Relationships. Complex methods include generalized least squares (best linear unbiased estimator (BLUE)) or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer $$F_{\text{st}}$$ coefficients from marker data have not been developed for related individuals. We derived a genomic Relationship Matrix, compatible with pedigree Relationships, that is constructed as a cross-product of {−1,0,1} codes and that is equivalent (apart from scale factors) to an identity-by-state Relationship Matrix at genome-wide markers. Using a simulation with a single population under selection in which only males and youngest animals are genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the naive method and the method of moments were biased (average estimates of 0.43 and 0.35). We also observed that genomic evaluation by ssGBLUP using metafounders was less biased in terms of estimates of genetic trend (bias of 0.01 instead of 0.12), resulted in less overdispersed (0.94 instead of 0.99) and as accurate (0.74) estimates of breeding values than ssGBLUP without metafounders and provided consistent estimates of heritability. Estimation of metafounder Relationships can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships reduces bias of genomic predictions with no loss in accuracy.

  • technical note avoiding the direct inversion of the numerator Relationship Matrix for genotyped animals in single step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient
    Journal of Animal Science, 2017
    Co-Authors: Yutaka Masuda, I. Misztal, Andres Legarra, S Tsuruta, B O Fragomeni, D A L Lourenco, I. Aguilar
    Abstract:

    This paper evaluates an efficient implementation to multiply the inverse of a numerator Relationship Matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator Relationship Matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse Matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.

  • metafounders are fst fixation indices and reduce bias in single step genomic evaluations
    bioRxiv, 2016
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    BACKGROUND: Metafounders are pseudo-individuals that condense the genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses estimation and usefulness of metafounder Relationships in Single Step GBLUP. RESULTS: We show that the ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, like Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals, and pedigree. Simple methods for estimation include naive computation of allele frequencies from marker genotypes or a method of moments equating average pedigree-based and marker-based Relationships. Complex methods include generalized least squares or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer Fst coefficients and Fst differentiation have not been developed for related populations. A compatible genomic Relationship Matrix constructed as a crossproduct of {-1,0,1} codes, and equivalent (up to scale factors) to an identity by state Relationship Matrix at the markers, is derived. Using a simulation with a single population under selection, in which only males and youngest animals were genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the other two (naive and method of moments) were biased (estimates of 0.43 and 0.35). We also observed that genomic evaluation by Single Step GBLUP using metafounders was less biased in terms of accurate genetic trend (0.01 instead of 0.12 bias), slightly overdispersed (0.94 instead of 0.99) and as accurate (0.74) than the regular Single Step GBLUP. Single Step GBLUP using metafounders also provided consistent estimates of heritability. CONCLUSIONS: Estimation of metafounder Relationship can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships improves bias of genomic predictions with no loss in accuracy.

  • crossbreed evaluations in single step genomic best linear unbiased predictor using adjusted realized Relationship matrices
    Journal of Animal Science, 2016
    Co-Authors: Daniela Lourenco, S Tsuruta, B O Fragomeni, Chingyi Chen, W O Herring, I. Misztal
    Abstract:

    Combining purebreed and crossbreed information is beneficial for genetic evaluation of some livestock species. Genetic evaluations can use Relationships based on genomic information, relying on allele frequencies that are breed specific. Single-step genomic BLUP (ssGBLUP) does not account for different allele frequencies, which could limit the genetic gain in crossbreed evaluations. In this study, we tested the performance of different breed-specific genomic Relationship matrices () in ssGBLUP for crossbreed evaluations; we also tested the importance of genotyping crossbred animals. Genotypes were available for purebreeds (AA and BB) and crossbreeds (F) in simulated and real pig populations. The number of genotyped animals was, on average, 4,315 for the simulated population and 15,798 for the real population. Cross-validation was performed on 1,200 and 3,117 F animals in the simulated and real populations, respectively. Simulated scenarios were under no artificial selection, mass selection, or BLUP selection. Two genomic Relationship matrices were constructed based on breed-specific allele frequencies: 1) , a genomic Relationship Matrix centered by breed-specific allele frequencies, and 2) , a genomic Relationship Matrix centered and scaled by breed-specific allele frequencies. All (the across-breed genomic Relationship Matrix), , and were also tuned to account for selective genotyping. Using breed-specific allele frequencies reduced the number of negative Relationships between 2 purebreeds, pulling the average closer to 0, as in the pedigree-based Relationship Matrix. For simulated populations that included mass selection, genomic EBV (GEBV) in F, when using and , were, on average, 10% more accurate than ; however, after tuning to account for selective genotyping, provided the same accuracy as for breed-specific genomic Relationship matrices. For the real population, accuracies for litter size in F were 0.62 for , , and , and tuning had no impact on accuracy, except for , which was 1 percentage point less accurate. Accuracy of GEBV for number of stillborns in F1 was 0.5 for all tested genomic Relationship matrices with no changes after tuning. We observed that genotyping F increased accuracies of GEBV for the same animals by up to 39% compared with having genotypes for only AA and BB. In crossbreed evaluations, accounting for breed-specific allele frequencies promoted changes in G that were not influential enough to improve accuracy of GEBV. Therefore, the best performance of ssGBLUP for crossbreed evaluations requires genotypes for pure- and crossbreeds and no breed-specific adjustments in the realized Relationship Matrix.

I. Aguilar - One of the best experts on this subject based on the ideXlab platform.

  • technical note avoiding the direct inversion of the numerator Relationship Matrix for genotyped animals in single step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient
    Journal of Animal Science, 2017
    Co-Authors: Yutaka Masuda, I. Misztal, Andres Legarra, S Tsuruta, B O Fragomeni, D A L Lourenco, I. Aguilar
    Abstract:

    This paper evaluates an efficient implementation to multiply the inverse of a numerator Relationship Matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator Relationship Matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse Matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.

  • implementation of genomic recursions in single step genomic best linear unbiased predictor for us holsteins with a large number of genotyped animals
    Journal of Dairy Science, 2016
    Co-Authors: Yutaka Masuda, I. Misztal, Andres Legarra, I. Aguilar, Daniela Lourenco, S Tsuruta, B O Fragomeni, T J Lawlor
    Abstract:

    The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic Relationship Matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic Relationship Matrix GAPY(-1) based on a direct inversion of genomic Relationship Matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals.

  • hot topic use of genomic recursions in single step genomic best linear unbiased predictor blup with a large number of genotypes
    Journal of Dairy Science, 2015
    Co-Authors: B O Fragomeni, Andres Legarra, I. Aguilar, Daniela Lourenco, S Tsuruta, Yutaka Masuda, T J Lawlor, I. Misztal
    Abstract:

    Abstract The purpose of this study was to evaluate the accuracy of genomic selection in single-step genomic BLUP (ssGBLUP) when the inverse of the genomic Relationship Matrix ( G ) is derived by the "algorithm for proven and young animals" (APY). This algorithm implements genomic recursions on a subset of "proven" animals. Only a Relationship Matrix for animals treated as "proven" needs to be inverted, and the extra costs of adding animals treated as "young" are linear. Analyses involved 10,102,702 final scores on 6,930,618 Holstein cows. Final score, which is a composite of type traits, is popular trait in the United States and was easily available for this study. A total of 100,000 animals with genotypes were used in the analyses and included 23,000 sires (16,000 with >5 progeny), 27,000 cows, and 50,000 young animals. Genomic EBV (GEBV) were calculated with a regular inverse of G , and with the G inverse approximated by APY. Animals in the proven subset included only sires (23,000), sires + cows (50,000), only cows (27,000), or sires with >5 progeny (16,000). The correlations of GEBV with APY and regular GEBV for young genotyped animals were 0.994, 0.995, 0.992, and 0.992, respectively Later, animals in the proven subset were randomly sampled from all genotyped animals in sets of 2,000, 5,000, 10,000, 15,000, and 20,000; each sample was replicated 4 times. Respective correlations were 0.97 (5,000 sample), 0.98 (10,000 sample), and 0.99 (20,000 sample), with minimal difference between samples of the same size. Genomic EBV with APY were accurate when the number of animals used in the subset is between 10,000 and 20,000, with little difference between the ways of creating the subset. Due to the approximately linear cost of APY, ssGBLUP with APY could support any number of genotyped animals without affecting accuracy.

  • single step a general approach for genomic selection
    Livestock Science, 2014
    Co-Authors: Andres Legarra, I. Aguilar, Ole F Christensen, I. Misztal
    Abstract:

    Genomic evaluation methods assume that the reference population is genotyped and phenotyped. This is most often false and the generation of pseudo-phenotypes is uncertain and inaccurate. However, markers obey transmission rules and therefore the covariances of marker genotypes across individuals can be modelled using pedigree Relationships. Based on this, an extension of the genomic Relationship Matrix can be constructed in which genomic Relationships are propagated to all individuals, resulting in a combined Relationship Matrix, which can be used in a BLUP procedure called the Single Step Genomic BLUP. This procedure provides so far the most comprehensive option for genomic evaluation. Several extensions, options and details are described: compatibility of genomic and pedigree Relationships, Bayesian regressions, multiple trait models, computational aspects, etc. Many details scattered through a series of papers are put together into this paper.

  • Using recursion to compute the inverse of the genomic Relationship Matrix
    Journal of Dairy Science, 2014
    Co-Authors: I. Misztal, Andres Legarra, I. Aguilar
    Abstract:

    Computing the inverse of the genomic Relationship Matrix using recursion was investigated. A traditional algorithm to invert the numerator Relationship Matrix is based on the observation that the conditional expectation for an additive effect of 1 animal given the effects of all other animals depends on the effects of its sire and dam only, each with a coefficient of 0.5. With genomic Relationships, such an expectation depends on all other genotyped animals, and the coefficients do not have any set value. For each animal, the coefficients plus the conditional variance can be called a genomic recursion. If such recursions are known, the mixed model equations can be solved without explicitly creating the inverse of the genomic Relationship Matrix. Several algorithms were developed to create genomic recursions. In an algorithm with sequential updates, genomic recursions are created animal by animal. That algorithm can also be used to update a known inverse of a genomic Relationship Matrix for additional genotypes. In an algorithm with forward updates, a newly computed recursion is immediately applied to update recursions for remaining animals. The computing costs for both algorithms depend on the sparsity pattern of the genomic recursions, but are lower or equal than for regular inversion. An algorithm for proven and young animals assumes that the genomic recursions for young animals contain coefficients only for proven animals. Such an algorithm generates exact genomic EBV in genomic BLUP and is an approximation in single-step genomic BLUP. That algorithm has a cubic cost for the number of proven animals and a linear cost for the number of young animals. The genomic recursions can provide new insight into genomic evaluation and possibly reduce costs of genetic predictions with extremely large numbers of genotypes.

Andres Legarra - One of the best experts on this subject based on the ideXlab platform.

  • metafounders are related to f st fixation indices and reduce bias in single step genomic evaluations
    Genetics Selection Evolution, 2017
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    Metafounders are pseudo-individuals that encapsulate genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses the estimation and usefulness of metafounder Relationships in single-step genomic best linear unbiased prediction (ssGBLUP). We show that ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, such as $$F_{\text{st}}$$ fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals and pedigree. Simple methods for their estimation include naive computation of allele frequencies from marker genotypes or a method of moments that equates average pedigree-based and marker-based Relationships. Complex methods include generalized least squares (best linear unbiased estimator (BLUE)) or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer $$F_{\text{st}}$$ coefficients from marker data have not been developed for related individuals. We derived a genomic Relationship Matrix, compatible with pedigree Relationships, that is constructed as a cross-product of {−1,0,1} codes and that is equivalent (apart from scale factors) to an identity-by-state Relationship Matrix at genome-wide markers. Using a simulation with a single population under selection in which only males and youngest animals are genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the naive method and the method of moments were biased (average estimates of 0.43 and 0.35). We also observed that genomic evaluation by ssGBLUP using metafounders was less biased in terms of estimates of genetic trend (bias of 0.01 instead of 0.12), resulted in less overdispersed (0.94 instead of 0.99) and as accurate (0.74) estimates of breeding values than ssGBLUP without metafounders and provided consistent estimates of heritability. Estimation of metafounder Relationships can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships reduces bias of genomic predictions with no loss in accuracy.

  • technical note avoiding the direct inversion of the numerator Relationship Matrix for genotyped animals in single step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient
    Journal of Animal Science, 2017
    Co-Authors: Yutaka Masuda, I. Misztal, Andres Legarra, S Tsuruta, B O Fragomeni, D A L Lourenco, I. Aguilar
    Abstract:

    This paper evaluates an efficient implementation to multiply the inverse of a numerator Relationship Matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator Relationship Matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse Matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.

  • metafounders are fst fixation indices and reduce bias in single step genomic evaluations
    bioRxiv, 2016
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    BACKGROUND: Metafounders are pseudo-individuals that condense the genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses estimation and usefulness of metafounder Relationships in Single Step GBLUP. RESULTS: We show that the ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, like Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals, and pedigree. Simple methods for estimation include naive computation of allele frequencies from marker genotypes or a method of moments equating average pedigree-based and marker-based Relationships. Complex methods include generalized least squares or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer Fst coefficients and Fst differentiation have not been developed for related populations. A compatible genomic Relationship Matrix constructed as a crossproduct of {-1,0,1} codes, and equivalent (up to scale factors) to an identity by state Relationship Matrix at the markers, is derived. Using a simulation with a single population under selection, in which only males and youngest animals were genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the other two (naive and method of moments) were biased (estimates of 0.43 and 0.35). We also observed that genomic evaluation by Single Step GBLUP using metafounders was less biased in terms of accurate genetic trend (0.01 instead of 0.12 bias), slightly overdispersed (0.94 instead of 0.99) and as accurate (0.74) than the regular Single Step GBLUP. Single Step GBLUP using metafounders also provided consistent estimates of heritability. CONCLUSIONS: Estimation of metafounder Relationship can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships improves bias of genomic predictions with no loss in accuracy.

  • implementation of genomic recursions in single step genomic best linear unbiased predictor for us holsteins with a large number of genotyped animals
    Journal of Dairy Science, 2016
    Co-Authors: Yutaka Masuda, I. Misztal, Andres Legarra, I. Aguilar, Daniela Lourenco, S Tsuruta, B O Fragomeni, T J Lawlor
    Abstract:

    The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic Relationship Matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic Relationship Matrix GAPY(-1) based on a direct inversion of genomic Relationship Matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals.

  • hot topic use of genomic recursions in single step genomic best linear unbiased predictor blup with a large number of genotypes
    Journal of Dairy Science, 2015
    Co-Authors: B O Fragomeni, Andres Legarra, I. Aguilar, Daniela Lourenco, S Tsuruta, Yutaka Masuda, T J Lawlor, I. Misztal
    Abstract:

    Abstract The purpose of this study was to evaluate the accuracy of genomic selection in single-step genomic BLUP (ssGBLUP) when the inverse of the genomic Relationship Matrix ( G ) is derived by the "algorithm for proven and young animals" (APY). This algorithm implements genomic recursions on a subset of "proven" animals. Only a Relationship Matrix for animals treated as "proven" needs to be inverted, and the extra costs of adding animals treated as "young" are linear. Analyses involved 10,102,702 final scores on 6,930,618 Holstein cows. Final score, which is a composite of type traits, is popular trait in the United States and was easily available for this study. A total of 100,000 animals with genotypes were used in the analyses and included 23,000 sires (16,000 with >5 progeny), 27,000 cows, and 50,000 young animals. Genomic EBV (GEBV) were calculated with a regular inverse of G , and with the G inverse approximated by APY. Animals in the proven subset included only sires (23,000), sires + cows (50,000), only cows (27,000), or sires with >5 progeny (16,000). The correlations of GEBV with APY and regular GEBV for young genotyped animals were 0.994, 0.995, 0.992, and 0.992, respectively Later, animals in the proven subset were randomly sampled from all genotyped animals in sets of 2,000, 5,000, 10,000, 15,000, and 20,000; each sample was replicated 4 times. Respective correlations were 0.97 (5,000 sample), 0.98 (10,000 sample), and 0.99 (20,000 sample), with minimal difference between samples of the same size. Genomic EBV with APY were accurate when the number of animals used in the subset is between 10,000 and 20,000, with little difference between the ways of creating the subset. Due to the approximately linear cost of APY, ssGBLUP with APY could support any number of genotyped animals without affecting accuracy.

Ole F Christensen - One of the best experts on this subject based on the ideXlab platform.

  • metafounders are related to f st fixation indices and reduce bias in single step genomic evaluations
    Genetics Selection Evolution, 2017
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    Metafounders are pseudo-individuals that encapsulate genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses the estimation and usefulness of metafounder Relationships in single-step genomic best linear unbiased prediction (ssGBLUP). We show that ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, such as $$F_{\text{st}}$$ fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals and pedigree. Simple methods for their estimation include naive computation of allele frequencies from marker genotypes or a method of moments that equates average pedigree-based and marker-based Relationships. Complex methods include generalized least squares (best linear unbiased estimator (BLUE)) or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer $$F_{\text{st}}$$ coefficients from marker data have not been developed for related individuals. We derived a genomic Relationship Matrix, compatible with pedigree Relationships, that is constructed as a cross-product of {−1,0,1} codes and that is equivalent (apart from scale factors) to an identity-by-state Relationship Matrix at genome-wide markers. Using a simulation with a single population under selection in which only males and youngest animals are genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the naive method and the method of moments were biased (average estimates of 0.43 and 0.35). We also observed that genomic evaluation by ssGBLUP using metafounders was less biased in terms of estimates of genetic trend (bias of 0.01 instead of 0.12), resulted in less overdispersed (0.94 instead of 0.99) and as accurate (0.74) estimates of breeding values than ssGBLUP without metafounders and provided consistent estimates of heritability. Estimation of metafounder Relationships can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships reduces bias of genomic predictions with no loss in accuracy.

  • metafounders are fst fixation indices and reduce bias in single step genomic evaluations
    bioRxiv, 2016
    Co-Authors: Carolina A Garciabaccino, I. Misztal, Andres Legarra, Ivan Pocrnic, Ole F Christensen, Zulma G Vitezica, R J C Cantet
    Abstract:

    BACKGROUND: Metafounders are pseudo-individuals that condense the genetic heterozygosity and Relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses estimation and usefulness of metafounder Relationships in Single Step GBLUP. RESULTS: We show that the ancestral Relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, like Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals, and pedigree. Simple methods for estimation include naive computation of allele frequencies from marker genotypes or a method of moments equating average pedigree-based and marker-based Relationships. Complex methods include generalized least squares or maximum likelihood based on pedigree Relationships. To our knowledge, methods to infer Fst coefficients and Fst differentiation have not been developed for related populations. A compatible genomic Relationship Matrix constructed as a crossproduct of {-1,0,1} codes, and equivalent (up to scale factors) to an identity by state Relationship Matrix at the markers, is derived. Using a simulation with a single population under selection, in which only males and youngest animals were genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral Relationship parameter (true value: 0.40) whereas the other two (naive and method of moments) were biased (estimates of 0.43 and 0.35). We also observed that genomic evaluation by Single Step GBLUP using metafounders was less biased in terms of accurate genetic trend (0.01 instead of 0.12 bias), slightly overdispersed (0.94 instead of 0.99) and as accurate (0.74) than the regular Single Step GBLUP. Single Step GBLUP using metafounders also provided consistent estimates of heritability. CONCLUSIONS: Estimation of metafounder Relationship can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder Relationships improves bias of genomic predictions with no loss in accuracy.

  • single step a general approach for genomic selection
    Livestock Science, 2014
    Co-Authors: Andres Legarra, I. Aguilar, Ole F Christensen, I. Misztal
    Abstract:

    Genomic evaluation methods assume that the reference population is genotyped and phenotyped. This is most often false and the generation of pseudo-phenotypes is uncertain and inaccurate. However, markers obey transmission rules and therefore the covariances of marker genotypes across individuals can be modelled using pedigree Relationships. Based on this, an extension of the genomic Relationship Matrix can be constructed in which genomic Relationships are propagated to all individuals, resulting in a combined Relationship Matrix, which can be used in a BLUP procedure called the Single Step Genomic BLUP. This procedure provides so far the most comprehensive option for genomic evaluation. Several extensions, options and details are described: compatibility of genomic and pedigree Relationships, Bayesian regressions, multiple trait models, computational aspects, etc. Many details scattered through a series of papers are put together into this paper.

  • compatibility of pedigree based and marker based Relationship matrices for single step genetic evaluation
    Genetics Selection Evolution, 2012
    Co-Authors: Ole F Christensen
    Abstract:

    Background Single-step methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with single-step methods is compatibility between the marker-based Relationship Matrix for genotyped animals and the pedigree-based Relationship Matrix. Therefore, it is necessary to adjust the marker-based Relationship Matrix to the pedigree-based Relationship Matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigree-based Relationship Matrix to be compatible with the marker-based Relationship Matrix instead of the reverse and 2) extending the single-step genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets.

  • compatibility of pedigree based and marker based Relationship matrices for single step genetic evaluation
    Genetics Selection Evolution, 2012
    Co-Authors: Ole F Christensen
    Abstract:

    Single-step methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with single-step methods is compatibility between the marker-based Relationship Matrix for genotyped animals and the pedigree-based Relationship Matrix. Therefore, it is necessary to adjust the marker-based Relationship Matrix to the pedigree-based Relationship Matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigree-based Relationship Matrix to be compatible with the marker-based Relationship Matrix instead of the reverse and 2) extending the single-step genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets. The method derived here is a single-step method in which the marker-based Relationship Matrix is constructed assuming all allele frequencies equal to 0.5 and the pedigree-based Relationship Matrix is constructed using the unusual assumption that animals in the base population are related and inbred with a Relationship coefficient γ and an inbreeding coefficient γ/2. Taken together, this γ parameter and a parameter that scales the marker-based Relationship Matrix can handle the issue of compatibility between marker-based and pedigree-based Relationship matrices. The full log-likelihood function used for parameter inference contains two terms. The first term is the REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, whereas the second term is the log-likelihood for the observed marker genotypes. Analyses of the two simulated datasets with this new method showed that 1) the parameters involved in adjusting marker-based and pedigree-based Relationship matrices can depend on both observed phenotypes and observed marker genotypes and 2) a strong association between these two parameters exists. Finally, this method performed at least as well as a method based on adjusting the marker-based Relationship Matrix. Using the full log-likelihood and adjusting the pedigree-based Relationship Matrix to be compatible with the marker-based Relationship Matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in single-step genetic evaluation.

M E Goddard - One of the best experts on this subject based on the ideXlab platform.

  • mixed model with correction for case control ascertainment increases association power
    bioRxiv, 2014
    Co-Authors: Tristan J Hayeck, M E Goddard, Bjarni J Vilhjalmsson, Noah Zaitlen, Poru Loh, Samuela Pollack, Alexander Gusev, Jian Yang, Guobo Chen, Peter M Visscher
    Abstract:

    We introduce a Liability Threshold Mixed Linear Model (LTMLM) association statistic for ascertained case-control studies that increases power vs. existing mixed model methods, with a well-controlled false-positive rate. Recent work has shown that existing mixed model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem using a chi-square score statistic computed from posterior mean liabilities (PML) under the liability threshold model. Each individual’s PML is conditional not only on that individual’s case-control status, but also on every individual’s case-control status and on the genetic Relationship Matrix obtained from the data. The PML are estimated using a multivariate Gibbs sampler, with the liability-scale phenotypic covariance Matrix based on the genetic Relationship Matrix (GRM) and a heritability parameter estimated via Haseman-Elston regression on case-control phenotypes followed by transformation to liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed model methods in all scenarios tested, with the magnitude of the improvement depending on sample size and severity of case-control ascertainment. In a WTCCC2 multiple sclerosis data set with >10,000 samples, LTMLM was correctly calibrated and attained a 4.1% improvement (P=0.007) in chi-square statistics (vs. existing mixed model methods) at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, an increase in power over existing mixed model methods is available for ascertained case-control studies of diseases with low prevalence.

  • a single step genomic model with direct estimation of marker effects
    Journal of Dairy Science, 2014
    Co-Authors: M E Goddard, F Reinhardt, R Reents
    Abstract:

    Compared with the currently widely used multi-step genomic models for genomic evaluation, single-step genomic models can provide more accurate genomic evaluation by jointly analyzing phenotypes and genotypes of all animals and can properly correct for the effect of genomic preselection on genetic evaluations. The objectives of this study were to introduce a single-step genomic model, allowing a direct estimation of single nucleotide polymorphism (SNP) effects, and to develop efficient computing algorithms for solving equations of the single-step SNP model. We proposed an alternative to the current single-step genomic model based on the genomic Relationship Matrix by including an additional step for estimating the effects of SNP markers. Our single-step SNP model allowed flexible modeling of SNP effects in terms of the number and variance of SNP markers. Moreover, our single-step SNP model included a residual polygenic effect with trait-specific variance for reducing inflation in genomic prediction. A kernel calculation of the SNP model involved repeated multiplications of the inverse of the pedigree Relationship Matrix of genotyped animals with a vector, for which numerical methods such as preconditioned conjugate gradients can be used. For estimating SNP effects, a special updating algorithm was proposed to separate residual polygenic effects from the SNP effects. We extended our single-step SNP model to general multiple-trait cases. By taking advantage of a block-diagonal (co)variance Matrix of SNP effects, we showed how to estimate multivariate SNP effects in an efficient way. A general prediction formula was derived for candidates without phenotypes, which can be used for frequent, interim genomic evaluations without running the whole genomic evaluation process. We discussed various issues related to implementation of the single-step SNP model in Holstein populations with an across-country genomic reference population.

  • improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high density single nucleotide polymorphism panels
    Journal of Dairy Science, 2012
    Co-Authors: Malena Erbe, Ben J. Hayes, Lakshmi K Matukumalli, S Goswami, P J Bowman, C M Reich, B A Mason, M E Goddard
    Abstract:

    Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic Relationship Matrix and the additive Relationship Matrix to a base at the time the breeds diverged, and regressed the genomic Relationship Matrix to account for sampling errors in estimating Relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic Relationship Matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits.

  • using the genomic Relationship Matrix to predict the accuracy of genomic selection
    Journal of Animal Breeding and Genetics, 2011
    Co-Authors: M E Goddard, Ben J. Hayes, T H E Meuwissen
    Abstract:

    Estimated breeding values (EBVs) using data from genetic markers can be predicted using a genomic Relationship Matrix, derived from animal's genotypes, and best linear unbiased prediction. However, if the accuracy of the EBVs is calculated in the usual manner (from the inverse element of the coefficient Matrix), it is likely to be overestimated owing to sampling errors in elements of the genomic Relationship Matrix. We show here that the correct accuracy can be obtained by regressing the Relationship Matrix towards the pedigree Relationship Matrix so that it is an unbiased estimate of the Relationships at the QTL controlling the trait. This method shows how the accuracy increases as the number of markers used increases because the regression coefficient (of genomic Relationship towards pedigree Relationship) increases. We also present a deterministic method for predicting the accuracy of such genomic EBVs before data on individual animals are collected. This method estimates the proportion of genetic variance explained by the markers, which is equal to the regression coefficient described above, and the accuracy with which marker effects are estimated. The latter depends on the variance in Relationship between pairs of animals, which equals the mean linkage disequilibrium over all pairs of loci. The theory was validated using simulated data and data on fat concentration in the milk of Holstein cattle.

  • accuracy of genomic breeding values in multi breed dairy cattle populations
    Genetics Selection Evolution, 2009
    Co-Authors: Ben J. Hayes, Phillip J Bowman, Amanda J Chamberlain, Klara L Verbyla, M E Goddard
    Abstract:

    Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient Matrix and from the realised accuracies of GEBV. Best linear unbiased prediction with a multi-breed genomic Relationship Matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient Matrix were compared to realised accuracies. When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient Matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained. Predicting genomic breeding values using a genomic Relationship Matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.