Pan-Genome

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 2073072 Experts worldwide ranked by ideXlab platform

Abhinay Ramaprasad - One of the best experts on this subject based on the ideXlab platform.

  • plasmodium vinckei genomes provide insights into the pan genome and evolution of rodent malaria parasites
    BMC Biology, 2021
    Co-Authors: Richard Culleton, Abhinay Ramaprasad, Arnab Pain, Severina Klaus, Olga Douvropoulou
    Abstract:

    Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Among the four RMPs originally collected from wild thicket rats in sub-Saharan Central Africa and adapted to laboratory mice, Plasmodium vinckei is the most geographically widespread with isolates collected from five separate locations. However, there is a lack of extensive phenotype and genotype data associated with this species, thus hindering its use in experimental studies. We have generated a comprehensive genetic resource for P. vinckei comprising of five reference-quality genomes, one for each of its subspecies, blood-stage RNA sequencing data for five P. vinckei isolates, and genotypes and growth phenotypes for ten isolates. Additionally, we sequenced seven isolates of the RMP species Plasmodium chabaudi and Plasmodium yoelii, thus extending genotypic information for four additional subspecies enabling a re-evaluation of the genotypic diversity and evolutionary history of RMPs. The five subspecies of P. vinckei have diverged widely from their common ancestor and have undergone large-scale genome rearrangements. Comparing P. vinckei genotypes reveals region-specific selection pressures particularly on genes involved in mosquito transmission. Using phylogenetic analyses, we show that RMP multigene families have evolved differently across the vinckei and berghei groups of RMPs and that family-specific expansions in P. chabaudi and P. vinckei occurred in the common vinckei group ancestor prior to speciation. The erythrocyte membrane antigen 1 and fam-c families in particular show considerable expansions among the lowland forest-dwelling P. vinckei parasites. The subspecies from the highland forests of Katanga, P. v. vinckei, has a uniquely smaller genome, a reduced multigene family repertoire and is also amenable to transfection making it an ideal parasite for reverse genetics. We also show that P. vinckei parasites are amenable to genetic crosses. Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and could serve as a resource to study parasite virulence and immunogenicity. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Amenability to genetic crossing and transfection make them also suitable for classical and functional genetics to study Plasmodium biology.

  • plasmodium vinckei genomes provide insights into the pan genome and evolution of rodent malaria parasites
    bioRxiv, 2020
    Co-Authors: Richard Culleton, Abhinay Ramaprasad, Arnab Pain, Severina Klaus
    Abstract:

    Background Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Plasmodium vinckei is the most geographically widespread of the four RMP species with isolates collected in five countries in sub-Saharan Central Africa between 1940s and 1970s. Several P. vinckei isolates are available but are relatively less characterized compared to other RMPs thus hampering its exploitation as rodent malaria models. We have generated a comprehensive resource for P. vinckei comprising of high-quality reference genomes, genotypes, gene expression profiles and growth phenotypes for ten P. vinckei isolates. This also allows for a comprehensive Pan-Genome analysis of the reference-quality genomes of RMPs. Results Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and potentially constitute a valuable resource to study parasite virulence and immunogenicity. The P. vinckei subspecies have diverged widely from their common ancestor and have undergone genomic structural variations. The subspecies from Katanga, P. v. vinckei, is unique among the P. vinckei isolates with a smaller genome size and a reduced multigene family repertoire. P. v. vinckei infections provide good schizont yields and is amenable to genetic manipulation, making it an ideal vinckei group parasite for reverse genetics. Comparing P. vinckei genotypes reveal region-specific selection pressures particularly on genes involved in mosquito transmission. RMP multigene family expansions observed in P. chabaudi and P. vinckei have occurred in their common ancestor prior to speciation. The erythrocyte membrane antigen 1 (ema1) and fam-c families have considerably expanded among the lowland forests-dwelling P. vinckei parasites with, however, most of the ema1 genes pseudogenised. Genetic crosses can be established in P. vinckei but are limited at present by low transmission success under the experimental conditions tested in this study. Conclusions We observe significant diversity among P. vinckei isolates making them particularly useful for the identification of genotype-phenotype relationships. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Plasmodium vinckei parasites are amenable to experimental genetic crosses and genetic manipulation, making them suitable for classical and functional genetics to study Plasmodium biology.

Arnab Pain - One of the best experts on this subject based on the ideXlab platform.

  • plasmodium vinckei genomes provide insights into the pan genome and evolution of rodent malaria parasites
    BMC Biology, 2021
    Co-Authors: Richard Culleton, Abhinay Ramaprasad, Arnab Pain, Severina Klaus, Olga Douvropoulou
    Abstract:

    Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Among the four RMPs originally collected from wild thicket rats in sub-Saharan Central Africa and adapted to laboratory mice, Plasmodium vinckei is the most geographically widespread with isolates collected from five separate locations. However, there is a lack of extensive phenotype and genotype data associated with this species, thus hindering its use in experimental studies. We have generated a comprehensive genetic resource for P. vinckei comprising of five reference-quality genomes, one for each of its subspecies, blood-stage RNA sequencing data for five P. vinckei isolates, and genotypes and growth phenotypes for ten isolates. Additionally, we sequenced seven isolates of the RMP species Plasmodium chabaudi and Plasmodium yoelii, thus extending genotypic information for four additional subspecies enabling a re-evaluation of the genotypic diversity and evolutionary history of RMPs. The five subspecies of P. vinckei have diverged widely from their common ancestor and have undergone large-scale genome rearrangements. Comparing P. vinckei genotypes reveals region-specific selection pressures particularly on genes involved in mosquito transmission. Using phylogenetic analyses, we show that RMP multigene families have evolved differently across the vinckei and berghei groups of RMPs and that family-specific expansions in P. chabaudi and P. vinckei occurred in the common vinckei group ancestor prior to speciation. The erythrocyte membrane antigen 1 and fam-c families in particular show considerable expansions among the lowland forest-dwelling P. vinckei parasites. The subspecies from the highland forests of Katanga, P. v. vinckei, has a uniquely smaller genome, a reduced multigene family repertoire and is also amenable to transfection making it an ideal parasite for reverse genetics. We also show that P. vinckei parasites are amenable to genetic crosses. Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and could serve as a resource to study parasite virulence and immunogenicity. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Amenability to genetic crossing and transfection make them also suitable for classical and functional genetics to study Plasmodium biology.

  • plasmodium vinckei genomes provide insights into the pan genome and evolution of rodent malaria parasites
    bioRxiv, 2020
    Co-Authors: Richard Culleton, Abhinay Ramaprasad, Arnab Pain, Severina Klaus
    Abstract:

    Background Rodent malaria parasites (RMPs) serve as tractable tools to study malaria parasite biology and host-parasite-vector interactions. Plasmodium vinckei is the most geographically widespread of the four RMP species with isolates collected in five countries in sub-Saharan Central Africa between 1940s and 1970s. Several P. vinckei isolates are available but are relatively less characterized compared to other RMPs thus hampering its exploitation as rodent malaria models. We have generated a comprehensive resource for P. vinckei comprising of high-quality reference genomes, genotypes, gene expression profiles and growth phenotypes for ten P. vinckei isolates. This also allows for a comprehensive Pan-Genome analysis of the reference-quality genomes of RMPs. Results Plasmodium vinckei isolates display a large degree of phenotypic and genotypic diversity and potentially constitute a valuable resource to study parasite virulence and immunogenicity. The P. vinckei subspecies have diverged widely from their common ancestor and have undergone genomic structural variations. The subspecies from Katanga, P. v. vinckei, is unique among the P. vinckei isolates with a smaller genome size and a reduced multigene family repertoire. P. v. vinckei infections provide good schizont yields and is amenable to genetic manipulation, making it an ideal vinckei group parasite for reverse genetics. Comparing P. vinckei genotypes reveal region-specific selection pressures particularly on genes involved in mosquito transmission. RMP multigene family expansions observed in P. chabaudi and P. vinckei have occurred in their common ancestor prior to speciation. The erythrocyte membrane antigen 1 (ema1) and fam-c families have considerably expanded among the lowland forests-dwelling P. vinckei parasites with, however, most of the ema1 genes pseudogenised. Genetic crosses can be established in P. vinckei but are limited at present by low transmission success under the experimental conditions tested in this study. Conclusions We observe significant diversity among P. vinckei isolates making them particularly useful for the identification of genotype-phenotype relationships. Inclusion of P. vinckei genomes provide new insights into the evolution of RMPs and their multigene families. Plasmodium vinckei parasites are amenable to experimental genetic crosses and genetic manipulation, making them suitable for classical and functional genetics to study Plasmodium biology.

Julian Parkhill - One of the best experts on this subject based on the ideXlab platform.

  • mycobacterium tuberculosis complex lineage 5 exhibits high levels of within lineage genomic diversity and differing gene content compared to the type strain h37rv
    bioRxiv, 2020
    Co-Authors: Julian Parkhill, Ndira C Sanoussi, Mireia Coscolla, Boatema Oforianyinam, Isaac Darko Otchere, Martin Antonio, Stefan Niemann, Simon R Harris
    Abstract:

    Abstract Pathogens of the Mycobacterium tuberculosis complex (MTBC) are considered monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum), from each other. However, genome variability and gene content especially of L5 and L6 strains have not been fully explored and may be potentially important for pathobiology and current approaches for genomic analysis of MTBC isolates, including transmission studies. We compared the genomes of 358 L5 clinical isolates (including 3 completed genomes and 355 Illumina WGS (whole genome sequenced) isolates) to the L5 complete genomes and H37Rv, and identified multiple genes differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sublineage into L5.3.1 and L5.3.2. These gene content differences had a small knock on effect on transmission cluster estimation, with clustering rates influenced by the selection of reference genome, and with potential over-estimation of recent transmission when using H37Rv as the reference genome. Our data show that the use of H37Rv as reference genome results in missing SNPs in genes unique for L5 strains. This potentially leads to an underestimation of the diversity present in the genome of L5 strains and in turn affects the transmission clustering rates. As such, a full capture of the gene diversity, especially for high resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most WGS data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the Pan-Genome of M. tuberculosis is at least several kilobases larger than previously thought, implying a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions. Data summary Sequence data for the Illumina dataset are available at European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) under the study accession numbers PRJEB38317 and PRJEB38656. Individual runs accession numbers are indicated in Table S8. PacBio raw reads for the L5 Benin genome are available on the ENA accession SAME3170744. The assembled L5 Benin genome is available on NCBI with accession PRJNA641267. To ensure naming conventions of the genes in the three L5 genomes can be followed, we have uploaded these annotated GFF files to figshare at https://doi.org/10.6084/m9.figshare.12911849.v1. Custom python scripts used in this analysis can be found at https://github.com/conmeehan/pathophy.

  • roary rapid large scale prokaryote pan genome analysis
    Bioinformatics, 2015
    Co-Authors: Andrew J Page, Martin Hunt, Carla A Cummins, Vanessa K Wong, Sandra Reuter, Matthew T G Holden, Maria Fookes, Jacqueline A Keane, Daniel Falush, Julian Parkhill
    Abstract:

    Summary: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. Availability and implementation: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary Contact: ku.ca.regnas@yraor Supplementary information: Supplementary data are available at Bioinformatics online.

  • roary rapid large scale prokaryote pan genome analysis
    bioRxiv, 2015
    Co-Authors: Andrew J Page, Martin Hunt, Carla A Cummins, Vanessa K Wong, Sandra Reuter, Matthew T G Holden, Maria Fookes, Jacqueline A Keane, Julian Parkhill
    Abstract:

    A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.

  • defining the estimated core genome of bacterial populations using a bayesian decision model
    PLOS Computational Biology, 2014
    Co-Authors: Andries J Van Tonder, Shilan Mistry, James E Bray, Dorothea M C Hill, Alison J Cody, Chris L Farmer, Keith P Klugman, Anne Von Gottberg, Stephen D Bentley, Julian Parkhill
    Abstract:

    The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.

Cecile Lorrain - One of the best experts on this subject based on the ideXlab platform.

  • dynamics of transposable elements in recently diverged fungal pathogens lineage specific transposable element content and efficiency of genome defences
    G3: Genes Genomes Genetics, 2021
    Co-Authors: Cecile Lorrain, Alice Feurtey, Mareike Moller, Janine Haueisen, Eva H Stukenbrock
    Abstract:

    Transposable elements (TEs) impact genome plasticity, architecture, and evolution in fungal plant pathogens. The wide range of TE content observed in fungal genomes reflects diverse efficacy of host-genome defense mechanisms that can counter-balance TE expansion and spread. Closely related species can harbor drastically different TE repertoires. The evolution of fungal effectors, which are crucial determinants of pathogenicity, has been linked to the activity of TEs in pathogen genomes. Here, we describe how TEs have shaped genome evolution of the fungal wheat pathogen Zymoseptoria tritici and four closely related species. We compared de novo TE annotations and repeat-induced point mutation signatures in 26 genomes from the Zymoseptoria species-complex. Then, we assessed the relative insertion ages of TEs using a comparative genomics approach. Finally, we explored the impact of TE insertions on genome architecture and plasticity. The 26 genomes of Zymoseptoria species reflect different TE dynamics with a majority of recent insertions. TEs associate with accessory genome compartments, with chromosomal rearrangements, with gene presence/absence variation, and with effectors in all Zymoseptoria species. We find that the extent of RIP-like signatures varies among Z. tritici genomes compared to genomes of the sister species. The detection of a reduction of RIP-like signatures and TE recent insertions in Z. tritici reflects ongoing but still moderate TE mobility.

  • dynamics of transposable elements in recently diverged fungal pathogens lineage specific transposable element content and efficiency of genome defences
    bioRxiv, 2020
    Co-Authors: Cecile Lorrain, Alice Feurtey, Mareike Moller, Janine Haueisen, Eva H Stukenbrock
    Abstract:

    Transposable elements (TEs) impact genome plasticity, architecture and evolution in fungal plant pathogens. The wide range of TE content observed in fungal genomes reflects diverse efficacy of host-genome defence mechanisms that can counter-balance TE expansion and spread. Closely related species can harbour drastically different TE repertoires, suggesting variation in the efficacy of genome defences. The evolution of fungal effectors, which are crucial determinants of pathogenicity, has been linked to the activity of TEs in pathogen genomes. Here we describe how TEs have shaped genome evolution of the fungal wheat pathogen Zymoseptoria tritici and four closely related species. We compared de novo TE annotations and Repeat-Induced Point mutation signatures in thirteen genomes from the Zymoseptoria species-complex. Then, we assessed the relative insertion ages of TEs using a comparative genomics approach. Finally, we explored the impact of TE insertions on genome architecture and plasticity. The thirteen genomes of Zymoseptoria species reflect different TE dynamics with a majority of recent insertions. TEs associate with distinct genome compartments in all Zymoseptoria species, including chromosomal rearrangements, genes showing presence/absence variation and effectors. European Z. tritici isolates have reduced signatures of Repeat-Induced Point mutations compared to Iranian isolates and closely related species. Our study supports the hypothesis that ongoing but moderate TE mobility in Zymoseptoria species shapes pathogen genome evolution.

David W Ussery - One of the best experts on this subject based on the ideXlab platform.

  • estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse escherichia coli genomes
    BMC Genomics, 2012
    Co-Authors: Rolf Sommer Kaas, David W Ussery, Carsten Friis, Frank Moller Aarestrup
    Abstract:

    Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli Pan-Genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a Pan-Genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.

  • Comparison of 61 Sequenced Escherichia coli Genomes
    Microbial Ecology, 2010
    Co-Authors: Oksana Lukjancenko, Trudy M Wassenaar, David W Ussery
    Abstract:

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted Pan-Genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or ‘accessory’ genes thus make up more than 90% of the Pan-Genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli , and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae.

  • standard operating procedure for computing pangenome trees
    Standards in Genomic Sciences, 2010
    Co-Authors: Lars Snipen, David W Ussery
    Abstract:

    We present the Pan-Genome tree as a tool for visualizing similarities and differences between closely related microbial genomes within a species or genus. Distance between genomes is computed as a weighted relative Manhattan distance based on gene family presence/absence. The weights can be chosen with emphasis on groups of gene families conserved to various degrees inside the Pan-Genome. The software is available for free as an R-package.