Orthology

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 16065 Experts worldwide ranked by ideXlab platform

Erik L L Sonnhammer - One of the best experts on this subject based on the ideXlab platform.

  • domainoid domain oriented Orthology inference
    BMC Bioinformatics, 2019
    Co-Authors: Emma Persson, Mateusz Kaduk, Sofia K Forslund, Erik L L Sonnhammer
    Abstract:

    Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for Orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length Orthology methods by inferring Orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains. This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level Orthology based on the fraction of domains that are orthologous can be inferred. Domainoid Orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark. Our results show that domain-based Orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches. https://bitbucket.org/sonnhammergroup/domainoid/

  • inparanoid 8 Orthology analysis between 273 proteomes mostly eukaryotic
    Nucleic Acids Research, 2015
    Co-Authors: Erik L L Sonnhammer, Gabriel Ostlund
    Abstract:

    The InParanoid database (http://InParanoid.sbc.su.se) provides a user interface to orthologs inferred by the InParanoid algorithm. As there are now international efforts to curate and standardize complete proteomes, we have switched to using these resources rather than gathering and curating the proteomes ourselves. InParanoid release 8 is based on the 66 reference proteomes that the ‘Quest for Orthologs’ community has agreed on using, plus 207 additional proteomes from the UniProt complete proteomes—in total 273 species. These represent 246 eukaryotes, 20 bacteria and seven archaea. Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%. In turn, the number of ortholog groups has increased by 423%. We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.

  • hieranoid hierarchical Orthology inference
    Journal of Molecular Biology, 2013
    Co-Authors: Erik L L Sonnhammer, Fabian Schreiber
    Abstract:

    An accurate inference of orthologs is essential in many research fields such as comparative genomics, molecular evolution, and genome annotation. Existing methods for genome-scale Orthology inferen ...

  • letter to the editor seqxml and orthoxml standards for sequence and Orthology information
    Briefings in Bioinformatics, 2011
    Co-Authors: Thomas Schmitt, David N Messina, Fabian Schreiber, Erik L L Sonnhammer
    Abstract:

    There is a great need for standards in the Orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence recordscthe input for Orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 Orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and Orthology information.

  • Orthology confers intron position conservation
    BMC Genomics, 2010
    Co-Authors: Anna Henricson, Erik L L Sonnhammer, Kristoffer Forslund
    Abstract:

    Background With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning Orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

Christophe Dessimoz - One of the best experts on this subject based on the ideXlab platform.

  • Orthology: Definitions, Prediction, and Impact on Species Phylogeny Inference
    2020
    Co-Authors: Rosa Fernández, Toni Gabaldón, Christophe Dessimoz
    Abstract:

    Orthology is a central concept in evolutionary and comparative genomics, used to relate corresponding genes in different species. In particular, orthologs are needed to infer species trees. In this chapter, we introduce the fundamental concepts of Orthology relationships and orthologous groups, including some non-trivial (and thus commonly misunderstood) implications. Next, we review some of the main methods and resources used to identify orthologs. The final part of the chapter discusses the impact of Orthology methods on species phylogeny inference, drawing lessons from several recent comparative studies.

  • Orthology: definitions, inference, and impact on species phylogeny inference
    arXiv: Populations and Evolution, 2019
    Co-Authors: Rosa Fernández, Toni Gabaldón, Christophe Dessimoz
    Abstract:

    Orthology is a central concept in evolutionary and comparative genomics, used to relate corresponding genes in different species. In particular, orthologs are needed to infer species trees. In this chapter, we introduce the fundamental concepts of Orthology relationships and orthologous groups, including some non-trivial (and thus commonly misunderstood) implications. Next, we review some of the main methods and resources used to identify orthologs. The final part of the chapter discusses the impact of Orthology methods on species phylogeny inference, drawing lessons from several recent comparative studies.

  • orthologous matrix oma algorithm 2 0 more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference
    Bioinformatics, 2017
    Co-Authors: Clementmarie Train, Adrian M Altenhoff, Natasha M Glover, Gaston H Gonnet, Christophe Dessimoz
    Abstract:

    Motivation: Accurate Orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is necessary to improve the scalability and robustness of Orthology inference methods. Results: We present improvements in the OMA algorithm: (i) refining the pairwise Orthology inference step to account for same-species paralogs evolving at different rates, and (ii) minimizing errors in the pairwise Orthology verification step by testing the consistency of pairwise distance estimates, which can be problematic in the presence of fragmentary sequences. In addition we introduce a more scalable procedure for hierarchical orthologous group (HOG) clustering, which are several orders of magnitude faster on large datasets. Using the Quest for Orthologs consortium Orthology benchmark service, we show that these changes translate into substantial improvement on multiple empirical datasets. Availability and Implementation: This new OMA 2.0 algorithm is used in the OMA database ( http://omabrowser.org ) from the March 2017 release onwards, and can be run on custom genomes using OMA standalone version 2.0 and above ( http://omabrowser.org/standalone ). Contact: christophe.dessimoz@unil.ch or adrian.altenhoff@inf.ethz.ch.

  • inferring Orthology and paralogy
    In: UNSPECIFIED (pp. 259-279). (2012), 2012
    Co-Authors: Christophe Dessimoz, Adrian M Altenhoff
    Abstract:

    The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer Orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various Orthology inference methods and databases, and examine the difficult issue of verifying and benchmarking Orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.

  • phylogenetic and functional assessment of orthologs inference projects and methods
    PLOS Computational Biology, 2009
    Co-Authors: Adrian M Altenhoff, Christophe Dessimoz
    Abstract:

    Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous Orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, Orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading Orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of Orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify Orthology. And third, it sets performance standards for current and future approaches.

Adrian M Altenhoff - One of the best experts on this subject based on the ideXlab platform.

  • orthologous matrix oma algorithm 2 0 more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference
    Bioinformatics, 2017
    Co-Authors: Clementmarie Train, Adrian M Altenhoff, Natasha M Glover, Gaston H Gonnet, Christophe Dessimoz
    Abstract:

    Motivation: Accurate Orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is necessary to improve the scalability and robustness of Orthology inference methods. Results: We present improvements in the OMA algorithm: (i) refining the pairwise Orthology inference step to account for same-species paralogs evolving at different rates, and (ii) minimizing errors in the pairwise Orthology verification step by testing the consistency of pairwise distance estimates, which can be problematic in the presence of fragmentary sequences. In addition we introduce a more scalable procedure for hierarchical orthologous group (HOG) clustering, which are several orders of magnitude faster on large datasets. Using the Quest for Orthologs consortium Orthology benchmark service, we show that these changes translate into substantial improvement on multiple empirical datasets. Availability and Implementation: This new OMA 2.0 algorithm is used in the OMA database ( http://omabrowser.org ) from the March 2017 release onwards, and can be run on custom genomes using OMA standalone version 2.0 and above ( http://omabrowser.org/standalone ). Contact: christophe.dessimoz@unil.ch or adrian.altenhoff@inf.ethz.ch.

  • the oma Orthology database in 2015 function predictions better plant support synteny view and other improvements
    Nucleic Acids Research, 2015
    Co-Authors: Adrian M Altenhoff, Natasha M Glover, Nives Skunca, Clementmarie Train
    Abstract:

    The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for ‘client-side’ Orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.

  • inferring Orthology and paralogy
    In: UNSPECIFIED (pp. 259-279). (2012), 2012
    Co-Authors: Christophe Dessimoz, Adrian M Altenhoff
    Abstract:

    The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer Orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various Orthology inference methods and databases, and examine the difficult issue of verifying and benchmarking Orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.

  • phylogenetic and functional assessment of orthologs inference projects and methods
    PLOS Computational Biology, 2009
    Co-Authors: Adrian M Altenhoff, Christophe Dessimoz
    Abstract:

    Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous Orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, Orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading Orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of Orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify Orthology. And third, it sets performance standards for current and future approaches.

Kimmen Sjolander - One of the best experts on this subject based on the ideXlab platform.

  • the phylofacts fat cat web server ortholog identification and function prediction using fast approximate tree classification
    Nucleic Acids Research, 2013
    Co-Authors: Cyrus Afrasiabi, Christopher Meacham, Bushra Samad, David Dineen, Kimmen Sjolander
    Abstract:

    The PhyloFacts ‘Fast Approximate Tree Classification’ (FAT-CAT) web server provides a novel approach to ortholog identification using subtree hidden Markov model-based placement of protein sequences to phylogenomic Orthology groups in the PhyloFacts database. Results on a data set of microbial, plant and animal proteins demonstrate FAT-CAT’s high precision at separating orthologs and paralogs and robustness to promiscuous domains. We also present results documenting the precision of ortholog identification based on subtree hidden Markov model scoring. The FAT-CAT phylogenetic placement is used to derive a functional annotation for the query, including confidence scores and drill-down capabilities. PhyloFacts’ broad taxonomic and functional coverage, with >7.3 M proteins from across the Tree of Life, enables FAT-CAT to predict orthologs and assign function for most sequence inputs. Four pipeline parameter presets are provided to handle different sequence types, including partial sequences and proteins containing promiscuous domains; users can also modify individual parameters. PhyloFacts trees matching the query can be viewed interactively online using the PhyloScope Javascript tree viewer and are hyperlinked to various external databases. The FAT-CAT web server is available at http://phylogenomics.berkeley.edu/phylofacts/fatcat/.

  • berkeley phog phylofacts Orthology group prediction web server
    Nucleic Acids Research, 2009
    Co-Authors: Ruchira S Datta, Christopher Meacham, Bushra Samad, Christoph Neyer, Kimmen Sjolander
    Abstract:

    Ortholog detection is essential in functional annotation of genomes, with applications to phylogenetic tree construction, prediction of proteinprotein interaction and other bioinformatics tasks. We present here the PHOG web server employing a novel algorithm to identify orthologs based on phylogenetic analysis. Results on a benchmark dataset from the TreeFam-A manually curated Orthology database show that PHOG provides a combination of high recall and precision competitive with both InParanoid and OrthoMCL, and allows users to target different taxonomic distances and precision levels through the use of tree-distance thresholds. For instance, OrthoMCL-DB achieved 76% recall and 66% precision on this dataset; at a slightly higher precision (68%) PHOG achieves 10% higher recall (86%). InParanoid achieved 87% recall at 24% precision on this dataset, while a PHOG variant designed for high recall achieves 88% recall at 61% precision, increasing precision by 37% over InParanoid. PHOG is based on pre-computed trees in the PhyloFacts resource, and contains over 366 K Orthology groups with a minimum of three species. Predicted orthologs are linked to GO annotations, pathway information and biological literature. The PHOG web server is available at http://phylofacts.berkeley.edu/orthologs/.

Kara Dolinski - One of the best experts on this subject based on the ideXlab platform.

  • inferring protein function from homology using the princeton protein Orthology database p pod
    Current protocols in human genetics, 2011
    Co-Authors: Michael S Livstone, Rose Oughtred, Sven Heinicke, Benjamin Vernot, Curtis Huttenhower, Dannie Durand, Kara Dolinski
    Abstract:

    Inferring a protein’s function by homology is a powerful tool for biologists. The Princeton Protein Orthology Database (P-POD) offers a simple way to visualize and analyze the relationships between homologous proteins in order to infer function. P-POD contains computationally-generated analysis distinguishing orthologs from paralogs combined with curated published information on functional complementation and on human diseases. P-POD also features an applet, Notung, for users to explore and modify phylogenetic trees and generate their own ortholog/paralogs calls. This unit describes how to search P-POD for precomputed data, how to find and use the associated curated information from the literature, and how to use Notung to analyze and refine the results.

  • the princeton protein Orthology database p pod a comparative genomics analysis tool for biologists
    PLOS ONE, 2007
    Co-Authors: Sven Heinicke, Michael S Livstone, Rose Oughtred, Charles Lu, Fan Kang, Samuel V Angiuoli, Owen White, David Botstein, Kara Dolinski
    Abstract:

    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.