Data Integration Strategy

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 189 Experts worldwide ranked by ideXlab platform

Witold R Rudnicki - One of the best experts on this subject based on the ideXlab platform.

  • Robust Data Integration Method for Classification of Biomedical Data
    Journal of Medical Systems, 2021
    Co-Authors: Aneta Polewko-klim, Krzysztof Mnich, Witold R Rudnicki
    Abstract:

    We present a protocol for integrating two types of biological Data – clinical and molecular – for more effective classification of patients with cancer. The proposed approach is a hybrid between early and late Data Integration Strategy. In this hybrid protocol, the set of informative clinical features is extended by the classification results based on molecular Data sets. The results are then treated as new synthetic variables. The hybrid protocol was applied to METABRIC breast cancer samples and TCGA urothelial bladder carcinoma samples. Various Data types were used for clinical endpoint prediction: clinical Data, gene expression, somatic copy number aberrations, RNA-Seq, methylation, and reverse phase protein array. The performance of the hybrid Data Integration was evaluated with a repeated cross validation procedure and compared with other methods of Data Integration: early Integration and late Integration via super learning. The hybrid method gave similar results to those obtained by the best of the tested variants of super learning. What is more, the hybrid method allowed for further sensitivity analysis and recursive feature elimination, which led to compact predictive models for cancer clinical endpoints. For breast cancer, the final model consists of eight clinical variables and two synthetic features obtained from molecular Data. For urothelial bladder carcinoma, only two clinical features and one synthetic variable were necessary to build the best predictive model. We have shown that the inclusion of the synthetic variables based on the RNA expression levels and copy number alterations can lead to improved quality of prognostic tests. Thus, it should be considered for inclusion in wider medical practice.

  • WorldCIST (2) - Data Integration Strategy for Robust Classification of Biomedical Data
    Trends and Innovations in Information Systems and Technologies, 2020
    Co-Authors: Aneta Polewko-klim, Witold R Rudnicki
    Abstract:

    This paper presents the protocol for Integration of Data coming from two most common types of biological Data (clinical and molecular) for more effective classification patients with cancer disease. In this protocol, the identification of the most informative features is performed by using statistical and information-theory based selection methods for molecular Data and the Boruta algorithm for clinical Data. Predictive models are built with the help of the Random Forest classification algorithm. The process of Data Integration includes combining the most informative clinical features and the synthetic features obtained from genetic marker models as input variables for classifier algorithms.

  • Data Integration Strategy for robust classification of biomedical Data
    World Conference on Information Systems and Technologies, 2020
    Co-Authors: Aneta Polewkoklim, Witold R Rudnicki
    Abstract:

    This paper presents the protocol for Integration of Data coming from two most common types of biological Data (clinical and molecular) for more effective classification patients with cancer disease. In this protocol, the identification of the most informative features is performed by using statistical and information-theory based selection methods for molecular Data and the Boruta algorithm for clinical Data. Predictive models are built with the help of the Random Forest classification algorithm. The process of Data Integration includes combining the most informative clinical features and the synthetic features obtained from genetic marker models as input variables for classifier algorithms.

Andre C A Nascimento - One of the best experts on this subject based on the ideXlab platform.

  • a multiple kernel learning algorithm for drug target interaction prediction
    BMC Bioinformatics, 2016
    Co-Authors: Andre C A Nascimento, Ricardo B C Prudencio, Ivan G Costa
    Abstract:

    Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information. We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the Integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four Data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources. Our analysis show that the proposed Data Integration Strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task. The source code and Data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/ .

  • A multiple kernel learning algorithm for drug-target interaction prediction
    BMC Bioinformatics, 2016
    Co-Authors: Andre C A Nascimento, Ricardo B C Prudencio, Ivan G Costa
    Abstract:

    Background Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information. Results We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the Integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four Data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources. Conclusions Our analysis show that the proposed Data Integration Strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task. Availability The source code and Data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/ .

Ivan G Costa - One of the best experts on this subject based on the ideXlab platform.

  • a multiple kernel learning algorithm for drug target interaction prediction
    BMC Bioinformatics, 2016
    Co-Authors: Andre C A Nascimento, Ricardo B C Prudencio, Ivan G Costa
    Abstract:

    Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information. We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the Integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four Data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources. Our analysis show that the proposed Data Integration Strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task. The source code and Data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/ .

  • A multiple kernel learning algorithm for drug-target interaction prediction
    BMC Bioinformatics, 2016
    Co-Authors: Andre C A Nascimento, Ricardo B C Prudencio, Ivan G Costa
    Abstract:

    Background Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information. Results We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the Integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four Data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources. Conclusions Our analysis show that the proposed Data Integration Strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task. Availability The source code and Data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/ .

Krzysztof Fujarewicz - One of the best experts on this subject based on the ideXlab platform.

  • Integration strategies of cross platform microarray Data sets in multiclass classification problem
    International Conference on Computational Science and Its Applications, 2019
    Co-Authors: Sebastian Student, Alicja Pluciennik, Krzysztof łakomiec, Agata Wilk, Wojciech Bensz, Krzysztof Fujarewicz
    Abstract:

    Despite the increasing amount of available gene expression Data, integrative analysis is still hindered by its high susceptibility to microenvironment fluctuations, resulting in inter-experiment variability known as batch effects. Therefore the development of Data Integration Strategy is now more necessary than ever. Although several normalization algorithms have already been proposed, we believe that an effective model must rely on Data migration between schemes. In this paper we apply this approach to a set of microarray Data from core needle biopsy of breast cancers spanning different microarray platforms, and demonstrate its effectiveness in Data preparation for unsupervised analysis and multiclass classification tasks. We propose a custom tool dedicated to defining the model structure. Additionally, we compare several pipelines of Data processing, combining Data normalization with different batch effect correction methods.

  • ICCSA (5) - Integration Strategies of Cross-Platform Microarray Data Sets in Multiclass Classification Problem.
    Computational Science and Its Applications – ICCSA 2019, 2019
    Co-Authors: Sebastian Student, Agata Wilk, Wojciech Bensz, Alicja Płuciennik, Krzysztof Łakomiec, Krzysztof Fujarewicz
    Abstract:

    Despite the increasing amount of available gene expression Data, integrative analysis is still hindered by its high susceptibility to microenvironment fluctuations, resulting in inter-experiment variability known as batch effects. Therefore the development of Data Integration Strategy is now more necessary than ever. Although several normalization algorithms have already been proposed, we believe that an effective model must rely on Data migration between schemes. In this paper we apply this approach to a set of microarray Data from core needle biopsy of breast cancers spanning different microarray platforms, and demonstrate its effectiveness in Data preparation for unsupervised analysis and multiclass classification tasks. We propose a custom tool dedicated to defining the model structure. Additionally, we compare several pipelines of Data processing, combining Data normalization with different batch effect correction methods.

Holger Fröhlich - One of the best experts on this subject based on the ideXlab platform.

  • Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics
    PloS one, 2013
    Co-Authors: Yupeng Cun, Holger Fröhlich
    Abstract:

    Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental Data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our Data Integration Strategy can improve classification performance compared to using a single Data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org).