Functional Similarity

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 16788 Experts worldwide ranked by ideXlab platform

Zhixia Teng - One of the best experts on this subject based on the ideXlab platform.

  • an improved method for Functional Similarity analysis of genes based on gene ontology
    BMC Systems Biology, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    Measures of gene Functional Similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene Functional Similarity methods have been proposed based on the semantic Similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene Functional Similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene Functional Similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene Functional Similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of Functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic Similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  • sgfsc speeding the gene Functional Similarity calculation based on hash tables
    BMC Bioinformatics, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    In recent years, many measures of gene Functional Similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene Functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic Similarity. Therefore, the efficient measurement of gene Functional Similarity remains a challenging problem. To speed current gene Functional Similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene Functional Similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene Functional Similarity on the whole genomic scale. The proposed strategy is successful in speeding current gene Functional Similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC . The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ .

  • measuring gene Functional Similarity based on group wise comparison of go terms
    Bioinformatics, 2013
    Co-Authors: Zhixia Teng, Chunyu Wang, Ping Xuan
    Abstract:

    Motivation: Compared with sequence and structure Similarity, Functional Similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require Functional Similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene Functional Similarity. Some existing methods combined semantic Similarity scores of single term pairs to estimate gene Functional Similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene Functional Similarity. It remains a challenge that measuring gene Functional Similarity reliably. Result: We propose a novel method called SORA to measure gene Functional Similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene Functional Similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic Similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function. Availability: The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/. Contact: maozuguo@hit.edu.cn

Xing Chen - One of the best experts on this subject based on the ideXlab platform.

  • Computational models for lncRNA function prediction and Functional Similarity calculation
    Briefings in Functional Genomics, 2018
    Co-Authors: Xing Chen, Na-na Guan, Jia Qu, Zhi-an Huang, Jianqiang Li
    Abstract:

    From transcriptional noise to dark matter of biology, the rapidly changing view of long non-coding RNA (lncRNA) leads to deep understanding of human complex diseases induced by abnormal expression of lncRNAs. There is urgent need to discern potential Functional roles of lncRNAs for further study of pathology, diagnosis, therapy, prognosis, prevention of human complex disease and disease biomarker detection at lncRNA level. Computational models are anticipated to be an effective way to combine current related databases for predicting most potential lncRNA functions and calculating lncRNA Functional Similarity on the large scale. In this review, we firstly illustrated the biological function of lncRNAs from five biological processes and briefly depicted the relationship between mutations or dysfunctions of lncRNAs and human complex diseases involving cancers, nervous system disorders and others. Then, 17 publicly available lncRNA function-related databases containing four types of Functional information content were introduced. Based on these databases, dozens of developed computational models are emerging to help characterize the Functional roles of lncRNAs. We therefore systematically described and classified both 16 lncRNA function prediction models and 9 lncRNA Functional Similarity calculation models into 8 types for highlighting their core algorithm and process. Finally, we concluded with discussions about the advantages and limitations of these computational models and future directions of lncRNA function prediction and Functional Similarity calculation. We believe that constructing systematic Functional annotation systems is essential to strengthen the prediction accuracy of computational models, which will accelerate the identification process of novel lncRNA functions in the future.

  • fmlncsim fuzzy measure based lncrna Functional Similarity calculation model
    Oncotarget, 2016
    Co-Authors: Xing Chen, Yuan Huang, Xuesong Wang, Keith C C Chan
    Abstract:

    // Xing Chen 1, * , Yu-An Huang 2, * , Xue-Song Wang 1 , Zhu-Hong You 3 , Keith C.C. Chan 2 1 School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, China 2 Department of Computing, Hong Kong Polytechnic University, Hong Kong 3 School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China * The first two authors should be regarded as joint First Authors Correspondence to: Xing Chen, email: xingchen@amss.ac.cn Zhu-Hong You, email: zhuhongyou@gmail.com Keywords: lncRNAs, Functional Similarity, disease, fuzzy measure, directed acyclic graph Received: April 05, 2016     Accepted: May 29, 2016     Published: June 14, 2016 ABSTRACT Accumulating experimental studies have indicated the influence of lncRNAs on various critical biological processes as well as disease development and progression. Calculating lncRNA Functional Similarity is of high value in inferring lncRNA functions and identifying potential lncRNA-disease associations. However, little effort has been attempt to measure the Functional Similarity among lncRNAs on a large scale. In this study, we developed a F uzzy M easure-based LNC RNA Functional SIM ilarity calculation model (FMLNCSIM) based on the assumption that Functionally similar lncRNAs tend to be associated with similar diseases. The performance improvement of FMLNCSIM mainly comes from the combination of information content and the concept of fuzzy measure, which was applied to the directed acyclic graphs of disease MeSH descriptors. To evaluate the effectiveness of FMLNCSIM, we further combined it with the previously proposed model of Laplacian Regularized Least Squares for lncRNA-Disease Association (LRLSLDA). As a result, the integrated model, LRLSLDA-FMLNCSIM, achieve good performance in the frameworks of global LOOCV (AUCs of 0.8266 and 0.9338 based on LncRNADisease and MNDR database) and 5-fold cross validation (average AUCs of 0.7979 and 0.9237 based on LncRNADisease and MNDR database), which significantly improve the performance of previous classical models. It is anticipated that FMLNCSIM could be used for searching Functionally similar lncRNAs and inferring lncRNA functions in the future researches.

  • ilncsim improved lncrna Functional Similarity calculation model
    Oncotarget, 2016
    Co-Authors: Yuan Huang, Xing Chen, Deshuang Huang, Keith C C Chan
    Abstract:

    Increasing observations have indicated that lncRNAs play a significant role in various critical biological processes and the development and progression of various human diseases. Constructing lncRNA Functional Similarity networks could benefit the development of computational models for inferring lncRNA functions and identifying lncRNA-disease associations. However, little effort has been devoted to quantifying lncRNA Functional Similarity. In this study, we developed an Improved LNCRNA Functional Similarity calculation model (ILNCSIM) based on the assumption that lncRNAs with similar biological functions tend to be involved in similar diseases. The main improvement comes from the combination of the concept of information content and the hierarchical structure of disease directed acyclic graphs for disease Similarity calculation. ILNCSIM was combined with the previously proposed model of Laplacian Regularized Least Squares for lncRNA-Disease Association to further evaluate its performance. As a result, new model obtained reliable performance in the leave-one-out cross validation (AUCs of 0.9316 and 0.9074 based on MNDR and Lnc2cancer databases, respectively), and 5-fold cross validation (AUCs of 0.9221 and 0.9033 for MNDR and Lnc2cancer databases), which significantly improved the prediction performance of previous models. It is anticipated that ILNCSIM could serve as an effective lncRNA function prediction model for future biomedical researches.

  • predicting lncrna disease associations and constructing lncrna Functional Similarity network based on the information of mirna
    Scientific Reports, 2015
    Co-Authors: Xing Chen
    Abstract:

    Accumulating experimental studies have indicated that lncRNAs play important roles in various critical biological process and their alterations and dysregulations have been associated with many important complex diseases. Developing effective computational models to predict potential disease-lncRNA association could benefit not only the understanding of disease mechanism at lncRNA level, but also the detection of disease biomarkers for disease diagnosis, treatment, prognosis and prevention. However, known experimentally confirmed disease-lncRNA associations are still very limited. In this study, a novel model of HyperGeometric distribution for LncRNA-Disease Association inference (HGLDA) was developed to predict lncRNA-disease associations by integrating miRNA-disease associations and lncRNA-miRNA interactions. Although HGLDA didn’t rely on any known disease-lncRNA associations, it still obtained an AUC of 0.7621 in the leave-one-out cross validation. Furthermore, 19 predicted associations for breast cancer, lung cancer, and colorectal cancer were verified by biological experimental studies. Furthermore, the model of LncRNA Functional Similarity Calculation based on the information of MiRNA (LFSCM) was developed to calculate lncRNA Functional Similarity on a large scale by integrating disease semantic Similarity, miRNA-disease associations, and miRNA-lncRNA interactions. It is anticipated that HGLDA and LFSCM could be effective biological tools for biomedical research.

  • constructing lncrna Functional Similarity network based on lncrna disease associations and disease semantic Similarity
    Scientific Reports, 2015
    Co-Authors: Xing Chen, Wen Ji, Yongdong Zhang
    Abstract:

    Increasing evidence has indicated that plenty of lncRNAs play important roles in many critical biological processes. Developing powerful computational models to construct lncRNA Functional Similarity network based on heterogeneous biological datasets is one of the most important and popular topics in the fields of both lncRNAs and complex diseases. Functional Similarity network consturction could benefit the model development for both lncRNA function inference and lncRNA-disease association identification. However, little effort has been attempted to analysis and calculate lncRNA Functional Similarity on a large scale. In this study, based on the assumption that Functionally similar lncRNAs tend to be associated with similar diseases, we developed two novel lncRNA Functional Similarity calculation models (LNCSIM). LNCSIM was evaluated by introducing Similarity scores into the model of Laplacian Regularized Least Squares for LncRNA–Disease Association (LRLSLDA) for lncRNA-disease association prediction. As a result, new predictive models improved the performance of LRLSLDA in the leave-one-out cross validation of various known lncRNA-disease associations datasets. Furthermore, some of the predictive results for colorectal cancer and lung cancer were verified by independent biological experimental studies. It is anticipated that LNCSIM could be a useful and important biological tool for human disease diagnosis, treatment, and prevention.

Zhen Tian - One of the best experts on this subject based on the ideXlab platform.

  • refine gene Functional Similarity network based on interaction networks
    BMC Bioinformatics, 2017
    Co-Authors: Zhen Tian, Chunyu Wang, Shiming Wang
    Abstract:

    In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene Functional Similarity networks so far. In this research, we will try to build a high reliable gene Functional Similarity network to promote its further application. Here, we propose a novel method to construct and refine the gene Functional Similarity network. It mainly contains three steps. First, we establish an integrated gene Functional Similarity networks based on different Functional Similarity calculation methods. Then, we construct a referenced gene-gene association network based on the protein-protein interaction networks. At last, we refine the spurious edges in the integrated gene Functional Similarity network with the help of the referenced gene-gene association network. Experiment results indicate that the refined gene Functional Similarity network (RGFSN) exhibits a scale-free, small world and modular architecture, with its degrees fit best to power law distribution. In addition, we conduct protein complex prediction experiment for human based on RGFSN and achieve an outstanding result, which implies it has high reliability and wide application significance. Our efforts are insightful for constructing and refining gene Functional Similarity networks, which can be applied to build other high quality biological networks.

  • an improved method for Functional Similarity analysis of genes based on gene ontology
    BMC Systems Biology, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    Measures of gene Functional Similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene Functional Similarity methods have been proposed based on the semantic Similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene Functional Similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene Functional Similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene Functional Similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of Functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic Similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  • sgfsc speeding the gene Functional Similarity calculation based on hash tables
    BMC Bioinformatics, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    In recent years, many measures of gene Functional Similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene Functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic Similarity. Therefore, the efficient measurement of gene Functional Similarity remains a challenging problem. To speed current gene Functional Similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene Functional Similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene Functional Similarity on the whole genomic scale. The proposed strategy is successful in speeding current gene Functional Similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC . The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ .

Vasileios Megalooikonomou - One of the best experts on this subject based on the ideXlab platform.

  • Learning pair-wise gene Functional Similarity by multiplex gene expression maps
    BMC Bioinformatics, 2012
    Co-Authors: Li An, Haibin Ling, Zoran Obradovic, Desmond J Smith, Vasileios Megalooikonomou
    Abstract:

    Background The relationships between the gene Functional Similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in Functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. Results Here, we propose a supervised learning methodology to predict pair-wise gene Functional Similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the Functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the Functional similarities are increased too. The model predicts the Functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. Conclusions By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene Functional Similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene Functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

  • Learning pair-wise gene Functional Similarity by multiplex gene expression maps.
    BMC bioinformatics, 2012
    Co-Authors: Li An, Haibin Ling, Zoran Obradovic, Desmond J Smith, Vasileios Megalooikonomou
    Abstract:

    The relationships between the gene Functional Similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in Functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. Here, we propose a supervised learning methodology to predict pair-wise gene Functional Similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the Functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the Functional similarities are increased too. The model predicts the Functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene Functional Similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene Functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

Chunyu Wang - One of the best experts on this subject based on the ideXlab platform.

  • refine gene Functional Similarity network based on interaction networks
    BMC Bioinformatics, 2017
    Co-Authors: Zhen Tian, Chunyu Wang, Shiming Wang
    Abstract:

    In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene Functional Similarity networks so far. In this research, we will try to build a high reliable gene Functional Similarity network to promote its further application. Here, we propose a novel method to construct and refine the gene Functional Similarity network. It mainly contains three steps. First, we establish an integrated gene Functional Similarity networks based on different Functional Similarity calculation methods. Then, we construct a referenced gene-gene association network based on the protein-protein interaction networks. At last, we refine the spurious edges in the integrated gene Functional Similarity network with the help of the referenced gene-gene association network. Experiment results indicate that the refined gene Functional Similarity network (RGFSN) exhibits a scale-free, small world and modular architecture, with its degrees fit best to power law distribution. In addition, we conduct protein complex prediction experiment for human based on RGFSN and achieve an outstanding result, which implies it has high reliability and wide application significance. Our efforts are insightful for constructing and refining gene Functional Similarity networks, which can be applied to build other high quality biological networks.

  • an improved method for Functional Similarity analysis of genes based on gene ontology
    BMC Systems Biology, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    Measures of gene Functional Similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene Functional Similarity methods have been proposed based on the semantic Similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene Functional Similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene Functional Similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene Functional Similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of Functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic Similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  • sgfsc speeding the gene Functional Similarity calculation based on hash tables
    BMC Bioinformatics, 2016
    Co-Authors: Zhen Tian, Chunyu Wang, Zhixia Teng
    Abstract:

    In recent years, many measures of gene Functional Similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene Functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic Similarity. Therefore, the efficient measurement of gene Functional Similarity remains a challenging problem. To speed current gene Functional Similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene Functional Similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene Functional Similarity on the whole genomic scale. The proposed strategy is successful in speeding current gene Functional Similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC . The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ .

  • measuring gene Functional Similarity based on group wise comparison of go terms
    Bioinformatics, 2013
    Co-Authors: Zhixia Teng, Chunyu Wang, Ping Xuan
    Abstract:

    Motivation: Compared with sequence and structure Similarity, Functional Similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require Functional Similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene Functional Similarity. Some existing methods combined semantic Similarity scores of single term pairs to estimate gene Functional Similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene Functional Similarity. It remains a challenge that measuring gene Functional Similarity reliably. Result: We propose a novel method called SORA to measure gene Functional Similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene Functional Similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic Similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function. Availability: The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/. Contact: maozuguo@hit.edu.cn