Protein Function

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1084377 Experts worldwide ranked by ideXlab platform

Steven Henikoff - One of the best experts on this subject based on the ideXlab platform.

  • predicting the effects of coding non synonymous variants on Protein Function using the sift algorithm
    Nature Protocols, 2009
    Co-Authors: Pradeep Kumar, Pauline C Ng, Steven Henikoff
    Abstract:

    The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a Protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the Function of the Protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects Protein Function. To assess the effect of a substitution, SIFT assumes that important positions in a Protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect Protein Function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the Protein sequence. The protocol typically takes 5–20 min, depending on the input. SIFT is available as an online tool ( http://sift-dna.org ).

  • predicting the effects of amino acid substitutions on Protein Function
    Annual Review of Genomics and Human Genetics, 2006
    Co-Authors: Pauline C Ng, Steven Henikoff
    Abstract:

    Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding Proteins. Because nsSNPs can affect Protein Function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect Protein Function from those that are Functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on Protein Function. Most methods predict approximately 25–30% of human nsSNPs to negatively affect Protein Function, and such nsSNPs tend to be rare in the population. We discuss the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding Protein Function.

  • sift predicting amino acid changes that affect Protein Function
    Nucleic Acids Research, 2003
    Co-Authors: Pauline C Ng, Steven Henikoff
    Abstract:

    Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in Protein-coding regions. Each substitution has the potential to affect Protein Function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects Protein Function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between Functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

  • accounting for human polymorphisms predicted to affect Protein Function
    Genome Research, 2002
    Co-Authors: Steven Henikoff
    Abstract:

    A major interest in human genetics is to determine whether a nonsynonymous single-base nucleotide polymorphism (nsSNP) in a gene affects its Protein product and, consequently, impacts the carrier's health. We used the SIFT (Sorting Intolerant From Tolerant) program to predict that 25% of 3084 nsSNPs from dbSNP, a public SNP database, would affect Protein Function. Some of the nsSNPs predicted to affect Function were variants known to be associated with disease. Others were artifacts of SNP discovery. Two reports have indicated that there are thousands of damaging nsSNPs in an individual's human genome; we find the number is likely to be much lower.

Pauline C Ng - One of the best experts on this subject based on the ideXlab platform.

  • predicting the effects of coding non synonymous variants on Protein Function using the sift algorithm
    Nature Protocols, 2009
    Co-Authors: Pradeep Kumar, Pauline C Ng, Steven Henikoff
    Abstract:

    The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a Protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the Function of the Protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects Protein Function. To assess the effect of a substitution, SIFT assumes that important positions in a Protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect Protein Function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the Protein sequence. The protocol typically takes 5–20 min, depending on the input. SIFT is available as an online tool ( http://sift-dna.org ).

  • predicting the effects of amino acid substitutions on Protein Function
    Annual Review of Genomics and Human Genetics, 2006
    Co-Authors: Pauline C Ng, Steven Henikoff
    Abstract:

    Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding Proteins. Because nsSNPs can affect Protein Function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect Protein Function from those that are Functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on Protein Function. Most methods predict approximately 25–30% of human nsSNPs to negatively affect Protein Function, and such nsSNPs tend to be rare in the population. We discuss the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding Protein Function.

  • sift predicting amino acid changes that affect Protein Function
    Nucleic Acids Research, 2003
    Co-Authors: Pauline C Ng, Steven Henikoff
    Abstract:

    Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in Protein-coding regions. Each substitution has the potential to affect Protein Function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects Protein Function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between Functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

Karin M. Verspoor - One of the best experts on this subject based on the ideXlab platform.

  • An expanded evaluation of Protein Function prediction methods shows an improvement in accuracy
    Genome Biology, 2016
    Co-Authors: Yuxiang Jiang, Tal Ronnen Oron, Asma R. Bankapur, Daniel D’andrea, Rosalba Lepore, Karin M. Verspoor, Indika Kahanda, Christopher S Funk, Wyatt T. Clark, Asa Ben-hur
    Abstract:

    BackgroundA major bottleneck in our understanding of the molecular underpinnings of life is the assignment of Function to Proteins. While molecular experiments provide the most reliable annotation of Proteins, their relatively low throughput and restricted purview have led to an increasing role for computational Function prediction. However, assessing methods for Protein Function prediction and tracking progress in the field remain challenging.ResultsWe conducted the second critical assessment of Functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign Protein Function. We evaluated 126 methods from 56 research groups for their ability to predict biological Functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 Proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.ConclusionsThe top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for Function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

  • an expanded evaluation of Protein Function prediction methods shows an improvement in accuracy
    Genome Biology, 2016
    Co-Authors: Yuxiang Jiang, Tal Ronnen Oron, Asma R. Bankapur, Rosalba Lepore, Karin M. Verspoor, Indika Kahanda, Christopher S Funk, Wyatt T. Clark, Daniel Dandrea, Asa Benhur
    Abstract:

    A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of Function to Proteins. While molecular experiments provide the most reliable annotation of Proteins, their relatively low throughput and restricted purview have led to an increasing role for computational Function prediction. However, assessing methods for Protein Function prediction and tracking progress in the field remain challenging. We conducted the second critical assessment of Functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign Protein Function. We evaluated 126 methods from 56 research groups for their ability to predict biological Functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 Proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for Function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

  • an expanded evaluation of Protein Function prediction methods shows an improvement in accuracy
    arXiv: Quantitative Methods, 2016
    Co-Authors: Yuxiang Jiang, Tal Ronnen Oron, Asma R. Bankapur, Rosalba Lepore, Karin M. Verspoor, Indika Kahanda, Christopher S Funk, Wyatt T. Clark, Daniel Dandrea, Asa Benhur
    Abstract:

    Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of Function to biological macromolecules, especially Proteins. While molecular experiments provide the most reliable annotation of Proteins, their relatively low throughput and restricted purview have led to an increasing role for computational Function prediction. However, accurately assessing methods for Protein Function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign Protein Function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological Functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 Proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational Function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for Function prediction.

  • Evaluating a variety of text-mined features for automatic Protein Function prediction with GOstruct
    Journal of Biomedical Semantics, 2015
    Co-Authors: Christopher S Funk, Asa Ben-hur, Indika Kahanda, Karin M. Verspoor
    Abstract:

    Most computational methods that predict Protein Function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human Protein Function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a “medium-throughput” pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which Proteins are curated.

Chris Ding - One of the best experts on this subject based on the ideXlab platform.

  • from Protein sequence to Protein Function via multi label linear discriminant analysis
    IEEE ACM Transactions on Computational Biology and Bioinformatics, 2017
    Co-Authors: Hua Wang, Heng Huang, Lin Yan, Chris Ding
    Abstract:

    Sequence describes the primary structure of a Protein, which contains important structural, characteristic, and genetic information and thereby motivates many sequence-based computational approaches to infer Protein Function. Among them, feature-base approaches attract increased attention because they make prediction from a set of transformed and more biologically meaningful sequence features. However, original features extracted from sequence are usually of high dimensionality and often compromised by irrelevant patterns, therefore dimension reduction is necessary prior to classification for efficient and effective Protein Function prediction. A Protein usually performs several different Functions within an organism, which makes Protein Function prediction a multi-label classification problem. In machine learning, multi-label classification deals with problems where each object may belong to more than one class. As a well-known feature reduction method, linear discriminant analysis (LDA) has been successfully applied in many practical applications. It, however, by nature is designed for single-label classification , in which each object can belong to exactly one class. Because directly applying LDA in multi-label classification causes ambiguity when computing scatters matrices, we apply a new Multi-label Linear Discriminant Analysis (MLDA) approach to address this problem and meanwhile preserve powerful classification capability inherited from classical LDA. We further extend MLDA by $\ell _1$ -normalization to overcome the problem of over-counting data points with multiple labels. In addition, we incorporate biological network data using Laplacian embedding into our method, and assess the reliability of predicted putative Functions. Extensive empirical evaluations demonstrate promising results of our methods.

  • Protein Function prediction via laplacian network partitioning incorporating Function category correlations
    International Joint Conference on Artificial Intelligence, 2013
    Co-Authors: Hua Wang, Heng Huang, Chris Ding
    Abstract:

    Understanding the molecular mechanisms of life requires decoding the Functions of the Proteins in an organism. Various high-throughput experimental techniques have been developed to characterize biological systems at the genome scale. A fundamental challenge of the post-genomic era is to assign biological Functions to all the Proteins encoded by the genome using high-throughput biological data. To address this challenge, we propose a novel Laplacian Network Partitioning incorporating Function category Correlations (LNPC) method to predict Protein Function on Protein-Protein interaction (PPI) networks by optimizing a Laplacian based quotient objective Function that seeks the optimal network configuration to maximize consistent Function assignments over edges on the whole graph. Unlike the existing approaches that have no unique optimization solutions, our optimization problem has unique global solution by eigen-decomposition methods. The correlations among Protein Function categories are quantified and incorporated into a correlated Protein affinity graph which is integrated into the PPI graph to significantly improve the Protein Function prediction accuracy. We apply our new method to the BioGRID dataset for the Saccharomyces Cerevisiae species using the MIPS annotation scheme. Our new method outperforms other related state-of-the-art approaches more than 63% by the average precision of Function prediction and 53% by the average F1 score.

  • Function Function correlated multi label Protein Function prediction over interaction networks
    Journal of Computational Biology, 2013
    Co-Authors: Hua Wang, Heng Huang, Chris Ding
    Abstract:

    Many previous works in Protein Function prediction make predictions one Function at a time, fundamentally, which assumes the Functional categories to be isolated. However, biological processes are highly correlated and usually intertwined together to happen at the same time; therefore, it would be beneficial to consider Protein Function prediction as one indivisible task and treat all the Functional categories as an integral and correlated prediction target. By leveraging the Function-Function correlations, it is expected to achieve improved overall predictive accuracy. To this end, we develop a network-based Protein Function prediction approach, under the framework of multi-label classification in machine learning, to utilize the Function-Function correlations. Besides formulating the Function-Function correlations in the optimization objective explicitly, we also exploit them as part of the pairwise Protein-Protein similarities implicitly. The algorithm is built upon the Green's Function over a graph, which not only employs the global topology of a network but also captures its local structures. In addition, we propose an adaptive decision boundary method to deal with the unbalanced distribution of Protein annotation data. Finally, we quantify the statistical confidence of predicted Functions to facilitate post-processing of proteomic analysis. We evaluate the proposed approach on Saccharomyces cerevisiae data, and the experimental results demonstrate very encouraging results.

  • Function Function correlated multi label Protein Function prediction over interaction networks
    Research in Computational Molecular Biology, 2012
    Co-Authors: Hua Wang, Heng Huang, Chris Ding
    Abstract:

    Many previous computational methods for Protein Function prediction make prediction one Function at a time, fundamentally, which is equivalent to assume the Functional categories of Proteins to be isolated. However, biological processes are highly correlated and usually intertwined together to happen at the same time, therefore it would be beneficial to consider Protein Function prediction as one indivisible task and treat all the Functional categories as an integral and correlated prediction target. By leveraging the Function-Function correlations, it is expected to achieve improved overall predictive accuracy. To this end, we develop a novel network based Protein Function prediction approach, under the framework of multi-label classification in machine learning, to utilize the Function-Function correlations. Besides formulating the Function-Function correlations in the optimization objective explicitly, we also exploit them as part of the pairwise Protein-Protein similarities implicitly. The algorithm is built upon the Green's Function over a graph, which not only employs the global topology of a network but also captures its local structural information. We evaluate the proposed approach on Saccharomyces cerevisiae species. The encouraging experimental results demonstrate the effectiveness of the proposed method.

Iddo Friedberg - One of the best experts on this subject based on the ideXlab platform.

  • biases in the experimental annotations of Protein Function and their effect on our understanding of Protein Function space
    PLOS Computational Biology, 2013
    Co-Authors: Alexandra M Schnoes, Patricia C Babbitt, David C Ream, A Thorman, Iddo Friedberg
    Abstract:

    The ongoing Functional annotation of Proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to Protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the Functional Protein annotations collected in databases. Here, we investigate just how prevalent is the “few articles - many Proteins” phenomenon. We examine the experimentally validated annotation of Proteins provided by several groups in the GO Consortium, and show that the distribution of Proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the Proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one Function or a small group of Functions, this leads to substantial biases in what we know about the Function of many Proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the Functional information derived from these experiments is mostly of the subcellular location of Proteins, and of the participation of Proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in Protein Function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of Function annotation programs, and anyone who uses Protein Function annotation data to plan experiments.

  • Computational Protein Function prediction: Are we making progress?
    Cellular and Molecular Life Sciences, 2007
    Co-Authors: Adam Godzik, Martin Jambon, Iddo Friedberg
    Abstract:

    The computational prediction of gene and Protein Function is rapidly gaining ground as a central undertaking in computational biology. Making sense of the flood of genomic data requires fast and reliable annotation. Many ingenious algorithms have been devised to infer a Protein’s Function from its amino acid sequence, 3D structure and chromosomal location of the encoding genes. However, there are significant challenges in assessing how well these programs perform. In this article we explore those challenges and review our own attempt at assessing the performance of those programs. We conclude that the task is far from complete and that a critical assessment of the performance of Function prediction programs is necessary to make true progress in computational Function prediction.

  • automated Protein Function prediction the genomic challenge
    Briefings in Bioinformatics, 2006
    Co-Authors: Iddo Friedberg
    Abstract:

    Overwhelmed with genomic data, biologists are facing the first big post-genomic questioncwhat do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and Protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a Functional annotation which is standardized and machine readable so that Function prediction programs could be incorporated into larger workflows. This is problematic due to the subjective and contextual definition of Protein Function. Third, there is a need to assess the quality of Function predictors. Again, the subjectivity of the term ‘Function’ and the various aspects of biological Function make this a challenging effort. This article briefly outlines the history of automated Protein Function prediction and surveys the latest innovations in all three topics.