Function Prediction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 198873 Experts worldwide ranked by ideXlab platform

Maozu Guo - One of the best experts on this subject based on the ideXlab platform.

  • A Literature Review of Gene Function Prediction by Modeling Gene Ontology.
    Frontiers in Genetics, 2020
    Co-Authors: Yingwen Zhao, Jun Wang, Jian Chen, Xiangliang Zhang, Maozu Guo
    Abstract:

    Annotating the Functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the Functional properties of gene products across species, and to facilitate the computational Prediction of gene Function. As GO is routinely updated, it serves as the gold standard and main knowledge source in Functional genomics. Many gene Function Prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene Function Prediction. Next, we summarize current methods of gene Function Prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.

  • Gene Function Prediction based on Gene Ontology Hierarchy Preserving Hashing.
    Genomics, 2018
    Co-Authors: Yingwen Zhao, Jun Wang, Maozu Guo
    Abstract:

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular Functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological Functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene Function Prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing Functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing Functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene Function Prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene Function Prediction show that HPHash performs better than other related approaches and it is robust to the number of hash Functions. In addition, we also take HPHash as a plugin for BLAST based gene Function Prediction. From the experimental results, HPHash again significantly improves the Prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.

  • Integrating multiple networks for protein Function Prediction
    BMC Systems Biology, 2015
    Co-Authors: Hailong Zhu, Carlotta Domeniconi, Maozu Guo
    Abstract:

    Background High throughput techniques produce multiple Functional association networks. Integrating these networks can enhance the accuracy of protein Function Prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein Functional annotation inference. A classifier is then trained on the composite network for predicting protein Functions. However, since these techniques model the optimization of the composite network and the Prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein Function Prediction.

  • Integrating multiple networks for protein Function Prediction
    BMC Systems Biology, 2015
    Co-Authors: Hailong Zhu, Carlotta Domeniconi, Maozu Guo
    Abstract:

    Background: High throughput techniques produce multiple Functional association networks. Integrating these networks can enhance the accuracy of protein Function Prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein Functional annotation inference. A classifier is then trained on the composite network for predicting protein Functions. However, since these techniques model the optimization of the composite network and the Prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein Function Prediction. Results: We address this issue by modeling the optimization of the composite network and the Prediction problems within a unified objective Function. In particular, we use a kernel target alignment technique and the loss Function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. Conclusion: MNet can effectively integrate multiple networks for protein Function Prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.

  • Integrating multiple networks for protein Function Prediction.
    BMC systems biology, 2015
    Co-Authors: Hailong Zhu, Carlotta Domeniconi, Maozu Guo
    Abstract:

    High throughput techniques produce multiple Functional association networks. Integrating these networks can enhance the accuracy of protein Function Prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein Functional annotation inference. A classifier is then trained on the composite network for predicting protein Functions. However, since these techniques model the optimization of the composite network and the Prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein Function Prediction. We address this issue by modeling the optimization of the composite network and the Prediction problems within a unified objective Function. In particular, we use a kernel target alignment technique and the loss Function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. MNet can effectively integrate multiple networks for protein Function Prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.

Daisuke Kihara - One of the best experts on this subject based on the ideXlab platform.

  • Structure- and sequence-based Function Prediction for non-homologous proteins
    Journal of Structural and Functional Genomics, 2012
    Co-Authors: Lee Sael, Meghana Chitale, Daisuke Kihara
    Abstract:

    The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain Functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for Functional elucidation of such proteins. However, conventional computational methods that transfer Functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational Function Prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract Function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein Function Prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based Function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in Functional elucidation of the protein structures.

  • Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks
    Protein Function Prediction for Omics Era, 2011
    Co-Authors: Meghana Chitale, Daisuke Kihara
    Abstract:

    After reviewing the underlying framework required for computational Function Prediction in the previous chapter, we discuss two advanced sequence-based Function Prediction methods developed in our group, namely the Protein Function Prediction (PFP) method and the Extended Similarity Group (ESG) method. PFP extends the traditional homology search by incorporating Functional associations between pairs of Gene Ontology terms based on the frequencies of co-occurrences in annotation of the same proteins in the database. PFP also considers very weakly similar sequences to the query, thereby increases its sensitivity and ability to predict low resolution Functional terms. On the other hand, ESG recursively searches the sequence similarity space around the query to find consensus annotations in the neighborhood. The last part of the chapter discusses the network structure of gene Functional space built by connecting proteins with Functional similarity. Function annotation was enriched by Predictions by PFP. Similarity to structures of protein-protein interaction networks and metabolic pathway networks is discussed.

  • esg extended similarity group method for automated protein Function Prediction
    Bioinformatics, 2009
    Co-Authors: Meghana Chitale, Troy Hawkins, Changsoon Park, Daisuke Kihara
    Abstract:

    Motivation: Importance of accurate automatic protein Function Prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation. Conventional methods have focused on high sequence similarity-based annotation transfer which relies on the concept of homology. However, many cases have been reported that simple transfer of Function from top hits of a homology search causes erroneous annotation. New methods are required to handle the sequence similarity in a more robust way to combine together signals from strongly and weakly similar proteins for effectively predicting Function for unknown proteins with high reliability. Results: We present the extended similarity group (ESG) method, which performs iterative sequence database searches and annotates a query sequence with Gene Ontology terms. Each annotation is assigned with probability based on its relative similarity score with the multiple-level neighbors in the protein similarity graph. We will depict how the statistical framework of ESG improves the Prediction accuracy by iteratively taking into account the neighborhood of query protein in the sequence similarity space. ESG outperforms conventional PSI-BLAST and the protein Function Prediction (PFP) algorithm. It is found that the iterative search is effective in capturing multiple-domains in a query protein, enabling accurately predicting several Functions which originate from different domains. Availability: ESG web server is available for automated protein Function Prediction at http://dragon.bio.purdue.edu/ESG/ Contact: rk.ca.uac@krapsc; ude.eudrup@arahikd Supplementary information: Supplementary data are available at Bioinformatics online.

  • New paradigm in protein Function Prediction for large scale omics analysis
    Molecular bioSystems, 2008
    Co-Authors: Troy Hawkins, Meghana Chitale, Daisuke Kihara
    Abstract:

    Biological interpretation of large scale omics data, such as proteinprotein interaction data and microarray gene expression data, requires that the Function of many genes in a data set is annotated or predicted. Here the predicted Function for a gene does not necessarily have to be a detailed biochemical Function; a broad class of Function, or low-resolution Function, may be sufficient to understand why a set of genes shows the observed expression pattern or interaction pattern. In this Highlight, we focus on two recent approaches for Function Prediction which aim to provide large coverage in Function Prediction, namely omics data driven approaches and a thorough data mining approach on homology search results.

  • Function Prediction of uncharacterized proteins.
    Journal of bioinformatics and computational biology, 2007
    Co-Authors: Troy Hawkins, Daisuke Kihara
    Abstract:

    Function Prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational Function Prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein Function Prediction. Here, first we review the definition of protein Function. Then the recent developments of these methods are introduced with special focus on the type of Predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.

Vipin Kumar - One of the best experts on this subject based on the ideXlab platform.

  • Computational Approaches to Protein Function Prediction
    2012
    Co-Authors: Gaurav Pandey, Vipin Kumar, Michael Steinbach, Chad L. Meyers
    Abstract:

    This book provides a comprehensive overview of the field of automated protein Function Prediction. It covers many techniques for solving this problem by computational means and discusses the most important principles underlying these techniques. By clearly describing a wide variety of automated techniques for protein Function Prediction and summarizing the main concepts behind these techniques, this book greatly reduces the time and effort required to understand the problem of protein Function Predictions and the numerous bioinformatics solutions that have been developed for it.

  • data mining techniques for enhancing protein Function Prediction
    2010
    Co-Authors: Vipin Kumar, Gaurav Pandey
    Abstract:

    Proteins are the most essential and versatile macromolecules of life, and the knowledge of their Functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the Function(s) of the protein are inferred from those of other proteins that are ”associated” to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein Function Prediction. We have also developed a method for incorporating the inter-relationships between Functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein Function Prediction algorithms, which enabled us to improve the quality of Predictions made for several Functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the Functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein Function Prediction.

  • Incorporating Functional inter-relationships into protein Function Prediction algorithms.
    BMC bioinformatics, 2009
    Co-Authors: Gaurav Pandey, Chad L Myers, Vipin Kumar
    Abstract:

    Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein Function Prediction. While successful Function Prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-Functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of Functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. We propose a method to enhance the performance of classification-based protein Function Prediction algorithms by addressing the issue of using these interrelationships between Functional classes constituting Functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and Prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate Predictions for a large number of the Functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of Predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of Functional inter-relationships enables the discovery of interesting biology in the form of novel Functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. We implemented and evaluated a methodology for incorporating interrelationships between Functional classes into a standard classification-based protein Function Prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown Functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/Functionalsimilarity/ .

  • Incorporating Functional inter-relationships into protein Function Prediction algorithms
    BMC Bioinformatics, 2009
    Co-Authors: Gaurav Pandey, Chad L Myers, Vipin Kumar
    Abstract:

    BACKGROUND: Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein Function Prediction. While successful Function Prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-Functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of Functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.\n\nRESULTS: We propose a method to enhance the performance of classification-based protein Function Prediction algorithms by addressing the issue of using these interrelationships between Functional classes constituting Functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and Prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate Predictions for a large number of the Functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of Predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of Functional inter-relationships enables the discovery of interesting biology in the form of novel Functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.\n\nCONCLUSION: We implemented and evaluated a methodology for incorporating interrelationships between Functional classes into a standard classification-based protein Function Prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown Functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/Functionalsimilarity/.

Gaurav Pandey - One of the best experts on this subject based on the ideXlab platform.

  • Computational Approaches to Protein Function Prediction
    2012
    Co-Authors: Gaurav Pandey, Vipin Kumar, Michael Steinbach, Chad L. Meyers
    Abstract:

    This book provides a comprehensive overview of the field of automated protein Function Prediction. It covers many techniques for solving this problem by computational means and discusses the most important principles underlying these techniques. By clearly describing a wide variety of automated techniques for protein Function Prediction and summarizing the main concepts behind these techniques, this book greatly reduces the time and effort required to understand the problem of protein Function Predictions and the numerous bioinformatics solutions that have been developed for it.

  • data mining techniques for enhancing protein Function Prediction
    2010
    Co-Authors: Vipin Kumar, Gaurav Pandey
    Abstract:

    Proteins are the most essential and versatile macromolecules of life, and the knowledge of their Functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the Function(s) of the protein are inferred from those of other proteins that are ”associated” to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein Function Prediction. We have also developed a method for incorporating the inter-relationships between Functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein Function Prediction algorithms, which enabled us to improve the quality of Predictions made for several Functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the Functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein Function Prediction.

  • Incorporating Functional inter-relationships into protein Function Prediction algorithms.
    BMC bioinformatics, 2009
    Co-Authors: Gaurav Pandey, Chad L Myers, Vipin Kumar
    Abstract:

    Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein Function Prediction. While successful Function Prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-Functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of Functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. We propose a method to enhance the performance of classification-based protein Function Prediction algorithms by addressing the issue of using these interrelationships between Functional classes constituting Functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and Prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate Predictions for a large number of the Functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of Predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of Functional inter-relationships enables the discovery of interesting biology in the form of novel Functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. We implemented and evaluated a methodology for incorporating interrelationships between Functional classes into a standard classification-based protein Function Prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown Functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/Functionalsimilarity/ .

  • Incorporating Functional inter-relationships into protein Function Prediction algorithms
    BMC Bioinformatics, 2009
    Co-Authors: Gaurav Pandey, Chad L Myers, Vipin Kumar
    Abstract:

    BACKGROUND: Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein Function Prediction. While successful Function Prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-Functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of Functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.\n\nRESULTS: We propose a method to enhance the performance of classification-based protein Function Prediction algorithms by addressing the issue of using these interrelationships between Functional classes constituting Functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and Prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate Predictions for a large number of the Functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of Predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of Functional inter-relationships enables the discovery of interesting biology in the form of novel Functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.\n\nCONCLUSION: We implemented and evaluated a methodology for incorporating interrelationships between Functional classes into a standard classification-based protein Function Prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown Functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/Functionalsimilarity/.

Renzhi Cao - One of the best experts on this subject based on the ideXlab platform.

  • HMMeta: Protein Function Prediction using Hidden Markov Models
    Proceedings of the 11th ACM International Conference on Bioinformatics Computational Biology and Health Informatics, 2020
    Co-Authors: Sola Gbenro, Kyle Hippe, Renzhi Cao
    Abstract:

    As the body of genomic product data increases at a much faster rate than can be annotated, computational analysis of protein Function has never been more important. In this research, we introduce a novel protein Function Prediction method HMMeta, which is based on the prominent natural language Prediction technique Hidden Markov Models (HMM). With a new representation of protein sequence as a language, we trained a unique HMM for each Gene Ontology (GO) term taken from the UniProt database, which in total has 27,451 unique GO IDs leading to the creation of 27,451 Hidden Markov Models. We employed data augmentation to artificially inflate the number of protein sequences associated with GO terms that have a limited amount in the database, and this helped to balance the number of protein sequences associated with each GO term. Predictions are made by running the sequence against each model created. The models within eighty percent of the top scoring model, or 75 models with the highest scores, whichever is less, represent the Functions that are most associated with the given sequence. We benchmarked our method in the latest Critical Assessment of protein Function Annotation (CAFA 4) experiment as CaoLab2, and we also evaluated HMMeta against several other protein Function Prediction methods against a subset of the UniProt database. HMMeta achieved favorable results as a sequence-based method, and outperforms a few notable methods in some categories through our evaluation, which shows great potential for automated protein Function Prediction. The tool is available at https://github.com/KPHippe/HMM-For-Protein-Prediction.

  • SMISS: a protein Function Prediction server by integrating multiple sources
    International Journal of Computational Intelligence in Bioinformatics and Systems Biology, 2020
    Co-Authors: Renzhi Cao, Zhaolong Zhong, Jianlin Cheng
    Abstract:

    SMISS is a novel web server for protein Function Prediction. Three different predictors can be selected for different usage. It integrates different sources to improve the protein Function Prediction accuracy, including the query protein sequence, protein-protein interaction network, gene-gene interaction network and the rules mined from protein Function associations. SMISS automatically switch to ab initio protein Function Prediction based on the query sequence when there is no homolog's in the database. It takes fasta format sequences as input; and several sequences can be submitted together without influencing the computation speed too much. PHP and Perl are two primary programming language used in the server. The CodeIgniter MVC PHP web framework and bootstrap front-end framework are used for building the server. It can be used in different platforms in standard web browser, such as Windows, Mac OS X, Linux and iOS. No plug-ins is needed for our website (availability: http://tulip.rnet.missouri.edu/profunc/).

  • prolango protein Function Prediction using neural machine translation based on a recurrent neural network
    Molecules, 2017
    Co-Authors: Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang, Zhangxin Chen
    Abstract:

    With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein Function Prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known Function. In this paper, we propose a novel method to convert the protein Function problem into a language translation problem by the new proposed protein sequence language “ProLan” to the protein Function language “GOLan”, and build a neural machine translation model based on recurrent neural networks to translate “ProLan” language to “GOLan” language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose Function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein Function Prediction. In summary, we first time propose a method which converts the protein Function Prediction problem to a language translation problem and applies a neural machine translation model for protein Function Prediction.

  • SMISS: A protein Function Prediction server by integrating multiple sources
    arXiv: Genomics, 2016
    Co-Authors: Renzhi Cao, Zhaolong Zhong, Jianlin Cheng
    Abstract:

    SMISS is a novel web server for protein Function Prediction. Three different predictors can be selected for different usage. It integrates different sources to improve the protein Function Prediction accuracy, including the query protein sequence, protein-protein interaction network, gene-gene interaction network, and the rules mined from protein Function associations. SMISS automatically switch to ab initio protein Function Prediction based on the query sequence when there is no homologs in the database. It takes fasta format sequences as input, and several sequences can submit together without influencing the computation speed too much. PHP and Perl are two primary programming language used in the server. The CodeIgniter MVC PHP web framework and Bootstrap front-end framework are used for building the server. It can be used in different platforms in standard web browser, such as Windows, Mac OS X, Linux, and iOS. No plugins are needed for our website. Availability: this http URL.

  • Integrated protein Function Prediction by mining Function associations, sequences, and protein-protein and gene-gene interaction networks.
    Methods (San Diego Calif.), 2015
    Co-Authors: Renzhi Cao, Jianlin Cheng
    Abstract:

    Motivations Protein Function Prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and proteinprotein interactions has been used mostly separately for protein Function Prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene–gene interaction networks generated from chromosomal conformation data together to improve protein Function Prediction.