PROSITE

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 2076 Experts worldwide ranked by ideXlab platform

Jorg Eppinger - One of the best experts on this subject based on the ideXlab platform.

  • Mining a database of single amplified genomes from red sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)
    Frontiers in Microbiology, 2014
    Co-Authors: Stefan W. Grötzinger, Wail Ba Alawi, Vladimir B Bajic, Intikhab Alam, Ulrich Stingl, Jorg Eppinger
    Abstract:

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

Nicolas Hulo - One of the best experts on this subject based on the ideXlab platform.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/ Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • scanPROSITE detection of PROSITE signature matches and prorule associated functional and structural residues in proteins
    Nucleic Acids Research, 2006
    Co-Authors: Edouard De Castro, Alexandre Gattiker, Amos Marc Bairoch, Christian J A Sigrist, Petra S Langendijkgenevaux, Virginie Bulliard, Elisabeth Gasteiger, Nicolas Hulo
    Abstract:

    ScanPROSITE--http://www.expasy.org/tools/scanPROSITE/--is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules--context-dependent annotation templates--to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.

  • PROSITE a documented database using patterns and profiles as motif descriptors
    Briefings in Bioinformatics, 2002
    Co-Authors: Christian J A Sigrist, Alexandre Gattiker, Amos Marc Bairoch, Lorenzo Cerutti, Nicolas Hulo, Laurent Falquet, Marco Pagni, Philipp Bucher
    Abstract:

    Among the various databases dedicated to the identification of protein families and domains, PROSITE is the first one created and has continuously evolved since. PROSITE currently consists of a large collection of biologically meaningful motifs that are described as patterns or profiles, and linked to documentation briefly describing the protein family or domain they are designed to detect. The close relationship of PROSITE with the SWISS-PROT protein database allows the evaluation of the sensitivity and specificity of the PROSITE motifs and their periodic reviewing. In return, PROSITE is used to help annotate SWISS-PROT entries. The main characteristics and the techniques of family and domain identification used by PROSITE are reviewed in this paper.

  • the PROSITE database its status in 2002
    Nucleic Acids Research, 2002
    Co-Authors: Laurent Falquet, Christian J A Sigrist, Nicolas Hulo, Marco Pagni, Philipp Bucher, Kay Hofmann, Amos Marc Bairoch
    Abstract:

    PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583–3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215–219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/PROSITE/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.

Stefan W. Grötzinger - One of the best experts on this subject based on the ideXlab platform.

  • Mining a database of single amplified genomes from red sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)
    Frontiers in Microbiology, 2014
    Co-Authors: Stefan W. Grötzinger, Wail Ba Alawi, Vladimir B Bajic, Intikhab Alam, Ulrich Stingl, Jorg Eppinger
    Abstract:

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  • Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)
    Frontiers Media S.A., 2014
    Co-Authors: Stefan W. Grötzinger, Wail Ba Alawi, Vladimir B Bajic, Intikhab Ealam, Ulrich Estingl, Jörg Eeppinger
    Abstract:

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website

Amos Marc Bairoch - One of the best experts on this subject based on the ideXlab platform.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/ Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • scanPROSITE detection of PROSITE signature matches and prorule associated functional and structural residues in proteins
    Nucleic Acids Research, 2006
    Co-Authors: Edouard De Castro, Alexandre Gattiker, Amos Marc Bairoch, Christian J A Sigrist, Petra S Langendijkgenevaux, Virginie Bulliard, Elisabeth Gasteiger, Nicolas Hulo
    Abstract:

    ScanPROSITE--http://www.expasy.org/tools/scanPROSITE/--is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules--context-dependent annotation templates--to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.

  • PROSITE a documented database using patterns and profiles as motif descriptors
    Briefings in Bioinformatics, 2002
    Co-Authors: Christian J A Sigrist, Alexandre Gattiker, Amos Marc Bairoch, Lorenzo Cerutti, Nicolas Hulo, Laurent Falquet, Marco Pagni, Philipp Bucher
    Abstract:

    Among the various databases dedicated to the identification of protein families and domains, PROSITE is the first one created and has continuously evolved since. PROSITE currently consists of a large collection of biologically meaningful motifs that are described as patterns or profiles, and linked to documentation briefly describing the protein family or domain they are designed to detect. The close relationship of PROSITE with the SWISS-PROT protein database allows the evaluation of the sensitivity and specificity of the PROSITE motifs and their periodic reviewing. In return, PROSITE is used to help annotate SWISS-PROT entries. The main characteristics and the techniques of family and domain identification used by PROSITE are reviewed in this paper.

  • scanPROSITE a reference implementation of a PROSITE scanning tool
    Applied Bioinformatics, 2002
    Co-Authors: Alexandre Gattiker, Elisabeth Gasteiger, Amos Marc Bairoch
    Abstract:

    Many different software tools are available publicly to scan the PROSITE database of protein families. However, none of them, to our knowledge, wholly implements the PROSITE syntax, or satisfies all the rules for scanning a pattern against a sequence. We hereby propose a strict definition of how a PROSITE pattern is to be scanned against a sequence, and provide a reference implementation of a tool to scan PROSITE patterns, rules and profiles against protein sequences.

Christian J A Sigrist - One of the best experts on this subject based on the ideXlab platform.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 ( approximately 70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • PROSITE a protein domain database for functional characterization and annotation
    Nucleic Acids Research, 2010
    Co-Authors: Christian J A Sigrist, Amos Marc Bairoch, Lorenzo Cerutti, Edouard De Castro, Petra S Langendijkgenevaux, Virginie Bulliard, Nicolas Hulo
    Abstract:

    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/ Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanPROSITE to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/PROSITE/.

  • scanPROSITE detection of PROSITE signature matches and prorule associated functional and structural residues in proteins
    Nucleic Acids Research, 2006
    Co-Authors: Edouard De Castro, Alexandre Gattiker, Amos Marc Bairoch, Christian J A Sigrist, Petra S Langendijkgenevaux, Virginie Bulliard, Elisabeth Gasteiger, Nicolas Hulo
    Abstract:

    ScanPROSITE--http://www.expasy.org/tools/scanPROSITE/--is a new and improved version of the web-based tool for detecting PROSITE signature matches in protein sequences. For a number of PROSITE profiles, the tool now makes use of ProRules--context-dependent annotation templates--to detect functional and structural intra-domain residues. The detection of those features enhances the power of function prediction based on profiles. Both user-defined sequences and sequences from the UniProt Knowledgebase can be matched against custom patterns, or against PROSITE signatures. To improve response times, matches of sequences from UniProtKB against PROSITE signatures are now retrieved from a pre-computed match database. Several output modes are available including simple text views and a rich mode providing an interactive match and feature viewer with a graphical representation of results.

  • PROSITE a documented database using patterns and profiles as motif descriptors
    Briefings in Bioinformatics, 2002
    Co-Authors: Christian J A Sigrist, Alexandre Gattiker, Amos Marc Bairoch, Lorenzo Cerutti, Nicolas Hulo, Laurent Falquet, Marco Pagni, Philipp Bucher
    Abstract:

    Among the various databases dedicated to the identification of protein families and domains, PROSITE is the first one created and has continuously evolved since. PROSITE currently consists of a large collection of biologically meaningful motifs that are described as patterns or profiles, and linked to documentation briefly describing the protein family or domain they are designed to detect. The close relationship of PROSITE with the SWISS-PROT protein database allows the evaluation of the sensitivity and specificity of the PROSITE motifs and their periodic reviewing. In return, PROSITE is used to help annotate SWISS-PROT entries. The main characteristics and the techniques of family and domain identification used by PROSITE are reviewed in this paper.

  • the PROSITE database its status in 2002
    Nucleic Acids Research, 2002
    Co-Authors: Laurent Falquet, Christian J A Sigrist, Nicolas Hulo, Marco Pagni, Philipp Bucher, Kay Hofmann, Amos Marc Bairoch
    Abstract:

    PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583–3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215–219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/PROSITE/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.