Molecular Data

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 916926 Experts worldwide ranked by ideXlab platform

Giorgio Valentini - One of the best experts on this subject based on the ideXlab platform.

  • Discovering multi–level structures in bio-Molecular Data through the Bernstein inequality
    BMC Bioinformatics, 2008
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    The unsupervised discovery of structures (i.e. clusterings) underlying Data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-Molecular Data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the Data are needed. To assess the statistical significance and to discover multi-level structures in bio-Molecular Data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the Data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray Data show the effectiveness of the proposed method. The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the Data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-Molecular Data.

  • Discovering multi-level structures in bio-Molecular Data through the Bernstein inequality.
    BMC bioinformatics, 2008
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    The unsupervised discovery of structures (i.e. clusterings) underlying Data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the "optimal" number of clusters in bio-Molecular Data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the Data are needed. To assess the statistical significance and to discover multi-level structures in bio-Molecular Data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the Data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray Data show the effectiveness of the proposed method. The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the Data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-Molecular Data.

  • Model order selection for bio-Molecular Data clustering.
    BMC bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems. We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-Molecular Data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected Data. A chi2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the Data (e.g. hierarchical structures). The experimental results show that our model order selection methods are competitive with other state-of-the-art stability based algorithms and are able to detect multiple levels of structure underlying both synthetic and gene expression Data.

  • Model order selection for bio-Molecular Data clustering.
    BMC Bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Background Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems.

  • Model order selection for bio-Molecular Data clustering
    BMC Bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Background Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems. Results We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-Molecular Data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected Data. A χ ^2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the Data (e.g. hierarchical structures). Conclusion The experimental results show that our model order selection methods are competitive with other state-of-the-art stability based algorithms and are able to detect multiple levels of structure underlying both synthetic and gene expression Data.

M.l. Dubernet - One of the best experts on this subject based on the ideXlab platform.

  • The virtual atomic and Molecular Data centre (VAMDC) consortium
    Journal of Physics B: Atomic Molecular and Optical Physics, 2016
    Co-Authors: M.l. Dubernet, Vincent Boudon, Bobby Antony, Yu L. Babikov, Klaus Bartschat, Bastiaan J. Braams, Hyun-kyung Chung, Fabien Daniel, Franck Delahaye
    Abstract:

    The Virtual Atomic and Molecular Data Centre (VAMDC) Consortium is a worldwide consortium which federates atomic and Molecular Databases through an e-science infrastructure and an organisation to support this activity. About 90% of the inter-connected Databases handle Data that are used for the interpretation of astronomical spectra and for modelling in many fields of astrophysics. Recently the VAMDC Consortium has connected Databases from the radiation damage and the plasma communities, as well as promoting the publication of Data from Indian institutes. This paper describes how the VAMDC Consortium is organised for the optimal distribution of atomic and Molecular Data for scientific research. It is noted that the VAMDC Consortium strongly advocates that authors of research papers using Data cite the original experimental and theoretical papers as well as the relevant Databases.

  • ``Virtual Atomic and Molecular Data Centre'' and Astrophysics: Level 2 Release
    2012
    Co-Authors: M. Doronin, M.l. Dubernet, P. Le Sidaner, Nic Walton, Nigel J. Mason, Nikolai Piskunov, G. Rixon, Stephan Schlemmer, Jonathan Tennyson, Asif Akram
    Abstract:

    The Virtual Atomic and Molecular Data Centre (VAMDC, \lta href=’http://www.vamdc.eu/’\gthttp://www.vamdc.eu/\lt/a\gt) is a consortium between groups involved in the generation, evaluation, and use of atomic and Molecular Data, funded by the European Union. VAMDC aims to build a reliable, open, flexible and interoperable e-science interface to existing atomic and Molecular Data. The project will cover establishing the core consortium, the development and deployment of the infrastructure and the development of interfaces to the existing atomic and Molecular Databases. This paper describes the organisation of the project and the achievements at the end of its second year.

  • virtual atomic and Molecular Data centre level 3 service and future prospects
    Highlights of Astronomy, 2012
    Co-Authors: M.l. Dubernet, G. Rixon, M. Doronin
    Abstract:

    The Virtual Atomic and Molecular Data Centre (VAMDC, http://www.vamdc.eu ) is an international Consortium that has created an interoperable e-science infrastructure for the exchange of atomic and Molecular Data. The VAMDC defines standards for the exchange of atomic and Molecular Data, develop reference implementation of those standards, deploys registries of internet resources (yellow pages), designs user applications in order to meet the user needs, builds Data access layers above Databases to provide unified outputs from these Databases, cares about asynchronous queries with workflows and connects its infrastructure to the grid. The paper describes the current service deployment of the VAMDC Data infrastructure across our registered Databases and the key features of the current infrastructure.

  • VAMDC : The Virtual Atomic and Molecular Data Center
    2011
    Co-Authors: Nic Walton, M.l. Dubernet, Nigel J. Mason, Nikolai Piskunov, G. Rixon
    Abstract:

    The Virtual Atomic and Molecular Data Center (VAMDC) is a European Union funded collaboration between groups involved in the generation, evaluation, and use of atomic and Molecular Data. VAMDC aims ...

  • VAMDC - The Virtual Atomic and Molecular Data Centre - A new way to disseminate atomic and Molecular Data - VAMDC level 1 release
    2011
    Co-Authors: G. Rixon, M.l. Dubernet, P. Le Sidaner, Nic Walton, Nigel J. Mason, Nikolai Piskunov, Stephan Schlemmer, Jonathan Tennyson, Asif Akram, Kevin Benson
    Abstract:

    The Virtual Atomic and Molecular Data Centre (VAMDC, http://www.vamdc.eu/) is a European‐Union‐funded collaboration between groups involved in the generation, evaluation, and use of atomic and Molecular Data. VAMDC aims to build a reliable, open, flexible and interoperable e‐science interface to existing atomic and Molecular Data. The project will cover establishing the core consortium, the development and deployment of the infrastructure and the development of interfaces to the existing atomic and Molecular Databases. This paper describes the organisation of the project and the achievements during its first year.

Alberto Bertoni - One of the best experts on this subject based on the ideXlab platform.

  • Discovering multi–level structures in bio-Molecular Data through the Bernstein inequality
    BMC Bioinformatics, 2008
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    The unsupervised discovery of structures (i.e. clusterings) underlying Data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the “optimal” number of clusters in bio-Molecular Data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the Data are needed. To assess the statistical significance and to discover multi-level structures in bio-Molecular Data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the Data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray Data show the effectiveness of the proposed method. The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the Data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-Molecular Data.

  • Discovering multi-level structures in bio-Molecular Data through the Bernstein inequality.
    BMC bioinformatics, 2008
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    The unsupervised discovery of structures (i.e. clusterings) underlying Data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the "optimal" number of clusters in bio-Molecular Data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the Data are needed. To assess the statistical significance and to discover multi-level structures in bio-Molecular Data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the Data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray Data show the effectiveness of the proposed method. The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the Data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-Molecular Data.

  • Model order selection for bio-Molecular Data clustering.
    BMC bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems. We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-Molecular Data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected Data. A chi2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the Data (e.g. hierarchical structures). The experimental results show that our model order selection methods are competitive with other state-of-the-art stability based algorithms and are able to detect multiple levels of structure underlying both synthetic and gene expression Data.

  • Model order selection for bio-Molecular Data clustering.
    BMC Bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Background Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems.

  • Model order selection for bio-Molecular Data clustering
    BMC Bioinformatics, 2007
    Co-Authors: Alberto Bertoni, Giorgio Valentini
    Abstract:

    Background Cluster analysis has been widely applied for investigating structure in bio-Molecular Data. A drawback of most clustering algorithms is that they cannot automatically detect the "natural" number of clusters underlying the Data, and in many cases we have no enough "a priori" biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the "optimal" number of clusters, but despite their successful application to the analysis of complex bio-Molecular Data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-Molecular Data are still major problems. Results We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-Molecular Data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected Data. A χ ^2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the Data (e.g. hierarchical structures). Conclusion The experimental results show that our model order selection methods are competitive with other state-of-the-art stability based algorithms and are able to detect multiple levels of structure underlying both synthetic and gene expression Data.

James M. Carpenter - One of the best experts on this subject based on the ideXlab platform.

  • Towards simultaneous analysis of morphological and Molecular Data in Hymenoptera
    Zoologica Scripta, 1999
    Co-Authors: James M. Carpenter
    Abstract:

    Principles and methods of simultaneous analysis in cladistics are reviewed, and the first, preliminary, analysis of combined Molecular and morphological Data on higher level relationships in Hymenoptera is presented to exemplify these principles. The morphological Data from Ronquist et al. (1999) matrix, derived from the character diagnoses of the phylogenetic tree of Rasnitsyn (1988), are combined with new Molecular Data for representatives of 10 superfamilies of Hymenoptera by means of optimization alignment. The resulting cladogram supports Apocrita and Aculeata as groups, and the superfamly Chrysidoidea, but not Chalcidoidea, Evanioidea, Vespoidea and Apoidea.

  • Towards simultaneous analysis of morphological and Molecular Data in Hymenoptera
    1999
    Co-Authors: James M. Carpenter, Ward C. Wheeler, Department Of Entomology
    Abstract:

    Principles and methods of simultaneous analysis in cladistics are reviewed, and the first, preliminary, analysis of combined Molecular and morphological Data on higher level relationships in Hymenoptera is presented to exemplify these principles. The morphological Data from Ronquist et al. (in press) matrix, derived from the character diagnoses of the phylogenetic tree of Rasnitsyn (1988), are combined with new Molecular Data for representatives of 10 superfamilies of Hymenoptera by means of optimization alignment. The resulting cladogram supports Apocrita and Aculeata as groups, and the superfamly Chrysidoidea, but not Chalcidoidea, Evanioidea, Vespoidea and Apoidea

Rafael Zardoya - One of the best experts on this subject based on the ideXlab platform.

  • Phylogenetic relationships of Iberian Aphodiini (Coleoptera: Scarabaeidae) based on morphological and Molecular Data.
    Molecular phylogenetics and evolution, 2004
    Co-Authors: Francisco-josé Cabrero-sañudo, Rafael Zardoya
    Abstract:

    A phylogeny of Iberian Aphodiini dung beetles was reconstructed based on morphological and Molecular Data. The Data set included a total of 84 variable characters from wing venation, mouthparts, genitalia, and external morphology, as well as mitochondrial partial cytochrome c oxidase I (COI), complete tRNA-Leu (UUR), and partial cytochrome c oxidase II (COII) gene nucleotide sequences (1210 positions). Phylogenetic trees based on Molecular Data were relatively more resolved than those based on morphological characters. The Bayesian analysis of combined Molecular and morphological Data provided resolution not achieved by each Data set separately. Ammoecius and Aphodius are the first lineages that branch off from the tree, followed by Acrossus, Nimbus, and Heptaulacus. The remaining studied taxa are recovered in a more derived clade that lacks internal resolution. Reconstructed trees based on Molecular Data showed relatively short internal nodes that were weakly supported. Such pattern may reflect a rapid radiation at the origin of the tribe Aphodiini, but also saturation of mutational changes. Several tests were conducted to discern between both competing hypotheses, as well as to assess the effect of incomplete taxon sampling.