Large Data Set

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 502422 Experts worldwide ranked by ideXlab platform

Didier Casane - One of the best experts on this subject based on the ideXlab platform.

  • phylogenomics of eukaryotes impact of missing Data on Large alignments
    Molecular Biology and Evolution, 2004
    Co-Authors: Herve Philippe, Elizabeth A. Snell, Philippe Lopez, Eric Bapteste, Peter W. H. Holland, Didier Casane
    Abstract:

    Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited Data Sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very Large Data Set containing 129 orthologous proteins ( approximately 30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are Data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent Large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, Data from EST projects were used for several species, resulting in a Large amount of missing Data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing Data. Therefore, this Large Data Set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when Large amounts of sequence information become available from a broader taxonomic range.

  • Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments
    Molecular Biology and Evolution, 2004
    Co-Authors: Herve Philippe, Elizabeth A. Snell, Philippe Lopez, Eric Bapteste, Didier Casane
    Abstract:

    Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited Data Sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very Large Data Set containing 129 orthologous proteins (;30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are Data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent Large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, Data from EST projects were used for several species, resulting in a Large amount of missing Data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing Data. Therefore, this Large Data Set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when Large amounts of sequence information become available from a broader taxonomic range.

Gardar Johannesson - One of the best experts on this subject based on the ideXlab platform.

  • fixed rank kriging for very Large spatial Data Sets
    Journal of The Royal Statistical Society Series B-statistical Methodology, 2008
    Co-Authors: Noel A Cressie, Gardar Johannesson
    Abstract:

    Spatial statistics for very Large spatial Data Sets is challenging. The size of the Data Set, "n", causes problems in computing optimal spatial predictors such as kriging, since its computational cost is of order . In addition, a Large Data Set is often defined on a Large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a Set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when "n" is very Large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covariance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very Large Data Set of total column ozone Data, observed over the entire globe, where "n" is of the order of hundreds of thousands. Copyright 2008 Royal Statistical Society.

  • fixed rank kriging for very Large spatial Data Sets
    Journal of The Royal Statistical Society Series B-statistical Methodology, 2008
    Co-Authors: Noel A Cressie, Gardar Johannesson
    Abstract:

    Summary. Spatial statistics for very Large spatial Data Sets is challenging. The size of the Data Set, n, causes problems in computing optimal spatial predictors such as kriging, since its computa tional cost is of order A73. In addition, a Large Data Set is often defined on a Large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a Set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when n is very Large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covari ance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very Large Data Set of total column ozone Data, observed over the entire globe, where n is of the order of hundreds of thousands.

Haifeng Chen - One of the best experts on this subject based on the ideXlab platform.

  • in silico log p prediction for a Large Data Set with support vector machines radial basis neural networks and multiple linear regression
    Chemical Biology & Drug Design, 2009
    Co-Authors: Haifeng Chen
    Abstract:

    Oil/water partition coefficient (log P) is one of the key points for lead compound to be drug. In silico log P models based solely on chemical structures have become an important part of modern drug discovery. Here, we report support vector machines, radial basis function neural networks, and multiple linear regression methods to investigate the correlation between partition coefficient and physico-chemical descriptors for a Large Data Set of compounds. The correlation coefficient r(2) between experimental and predicted log P for training and test Sets by support vector machines, radial basis function neural networks, and multiple linear regression is 0.92, 0.90, and 0.88, respectively. The results show that non-linear support vector machines derives statistical models that have better prediction ability than those of radial basis function neural networks and multiple linear regression methods. This indicates that support vector machines can be used as an alternative modeling tool for quantitative structure-property/activity relationships studies.

Herve Philippe - One of the best experts on this subject based on the ideXlab platform.

  • phylogenomics of eukaryotes impact of missing Data on Large alignments
    Molecular Biology and Evolution, 2004
    Co-Authors: Herve Philippe, Elizabeth A. Snell, Philippe Lopez, Eric Bapteste, Peter W. H. Holland, Didier Casane
    Abstract:

    Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited Data Sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very Large Data Set containing 129 orthologous proteins ( approximately 30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are Data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent Large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, Data from EST projects were used for several species, resulting in a Large amount of missing Data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing Data. Therefore, this Large Data Set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when Large amounts of sequence information become available from a broader taxonomic range.

  • Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments
    Molecular Biology and Evolution, 2004
    Co-Authors: Herve Philippe, Elizabeth A. Snell, Philippe Lopez, Eric Bapteste, Didier Casane
    Abstract:

    Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited Data Sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very Large Data Set containing 129 orthologous proteins (;30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are Data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent Large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, Data from EST projects were used for several species, resulting in a Large amount of missing Data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing Data. Therefore, this Large Data Set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when Large amounts of sequence information become available from a broader taxonomic range.

Noel A Cressie - One of the best experts on this subject based on the ideXlab platform.

  • fixed rank kriging for very Large spatial Data Sets
    Journal of The Royal Statistical Society Series B-statistical Methodology, 2008
    Co-Authors: Noel A Cressie, Gardar Johannesson
    Abstract:

    Spatial statistics for very Large spatial Data Sets is challenging. The size of the Data Set, "n", causes problems in computing optimal spatial predictors such as kriging, since its computational cost is of order . In addition, a Large Data Set is often defined on a Large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a Set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when "n" is very Large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covariance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very Large Data Set of total column ozone Data, observed over the entire globe, where "n" is of the order of hundreds of thousands. Copyright 2008 Royal Statistical Society.

  • fixed rank kriging for very Large spatial Data Sets
    Journal of The Royal Statistical Society Series B-statistical Methodology, 2008
    Co-Authors: Noel A Cressie, Gardar Johannesson
    Abstract:

    Summary. Spatial statistics for very Large spatial Data Sets is challenging. The size of the Data Set, n, causes problems in computing optimal spatial predictors such as kriging, since its computa tional cost is of order A73. In addition, a Large Data Set is often defined on a Large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a Set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when n is very Large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covari ance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very Large Data Set of total column ozone Data, observed over the entire globe, where n is of the order of hundreds of thousands.