Sequence Database

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Rolf Apweiler - One of the best experts on this subject based on the ideXlab platform.

  • functional information in swiss prot the basis for large scale characterisation of protein Sequences
    Briefings in Bioinformatics, 2001
    Co-Authors: Rolf Apweiler
    Abstract:

    With the rapid growth of Sequence Databases, there is an increasing need for reliable functional characterisation and annotation of newly predicted proteins. To cope with such large data volumes, faster and more effective means of protein Sequence characterisation and annotation are required. One promising approach is automatic large-scale functional characterisation and annotation, which is generated with limited human interaction. However, such an approach is heavily dependent on reliable data sources. The SWISS-PROT protein Sequence Database plays an essential role here owing to its high level of functional information.

  • the swiss prot protein Sequence data bank and its supplement trembl in 1999
    Nucleic Acids Research, 1998
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include: cross-references to additional Databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www. ebi.ac.uk/sprot

  • the swiss prot protein Sequence data bank and its supplement trembl in 1999
    Nucleic Acids Research, 1998
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include format and content enhancements, cross-references to additional Databases, new documentation files and improvements to TrEMBL, a computer-annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDSs) in the EMBL Nucleotide Sequence Database, except the CDSs already included in SWISS-PROT. We also describe the Human Proteomics Initiative (HPI), a major project to annotate all known human Sequences according to the quality standards of SWISS-PROT. SWISS-PROT is available at: http://www.expasy.ch/sprot/ and http://www.ebi.ac.uk/swissprot/

  • the swiss prot protein Sequence data bank and its supplement trembl
    Nucleic Acids Research, 1997
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotations (such as the description of the function of a protein, structure of its domains, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include: an increase in the number and scope of model organisms; cross-references to two additional Databases; a variety of new documentation files and the creation of TrEMBL, a computer annotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except the CDS already included in SWISS-PROT.

  • the swiss prot protein Sequence data bank and its new supplement trembl
    Nucleic Acids Research, 1996
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc), a minimal level of redundancy and a high level of integration with other Databases. Recent developments of the Database include: an increase in the number and scope of model organisms; cross-references to seven additional Databases; a variety of new documentation files; the creation of TREMBL, and unannotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except CDS already included in SWISS-PROT.

Amos Marc Bairoch - One of the best experts on this subject based on the ideXlab platform.

  • the swiss prot protein Sequence data bank and its supplement trembl in 1999
    Nucleic Acids Research, 1998
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include: cross-references to additional Databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www. ebi.ac.uk/sprot

  • the swiss prot protein Sequence data bank and its supplement trembl in 1999
    Nucleic Acids Research, 1998
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include format and content enhancements, cross-references to additional Databases, new documentation files and improvements to TrEMBL, a computer-annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDSs) in the EMBL Nucleotide Sequence Database, except the CDSs already included in SWISS-PROT. We also describe the Human Proteomics Initiative (HPI), a major project to annotate all known human Sequences according to the quality standards of SWISS-PROT. SWISS-PROT is available at: http://www.expasy.ch/sprot/ and http://www.ebi.ac.uk/swissprot/

  • the swiss prot protein Sequence data bank and its supplement trembl
    Nucleic Acids Research, 1997
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotations (such as the description of the function of a protein, structure of its domains, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other Databases. Recent developments of the Database include: an increase in the number and scope of model organisms; cross-references to two additional Databases; a variety of new documentation files and the creation of TrEMBL, a computer annotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except the CDS already included in SWISS-PROT.

  • the swiss prot protein Sequence data bank and its new supplement trembl
    Nucleic Acids Research, 1996
    Co-Authors: Amos Marc Bairoch, Rolf Apweiler
    Abstract:

    SWISS-PROT is a curated protein Sequence Database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc), a minimal level of redundancy and a high level of integration with other Databases. Recent developments of the Database include: an increase in the number and scope of model organisms; cross-references to seven additional Databases; a variety of new documentation files; the creation of TREMBL, and unannotated supplement to SWISS-PROT. This supplement consists of entries in SWISS-PROT-like format derived from the translation of all coding Sequences (CDS) in the EMBL nucleotide Sequence Database, except CDS already included in SWISS-PROT.

  • the swiss prot protein Sequence data bank current status
    Nucleic Acids Research, 1994
    Co-Authors: Amos Marc Bairoch, Brigitte Boeckmann
    Abstract:

    SWISS-PROT is an annotated protein Sequence Database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS-PROT protein Sequence data bank consist of Sequence entries. Sequence entries are composed of different lines types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Database. A sample SWISS-PROT entry is shown in Figure 1.

Ruedi Aebersold - One of the best experts on this subject based on the ideXlab platform.

  • building consensus spectral libraries for peptide identification in proteomics
    Nature Methods, 2008
    Co-Authors: Henry H N Lam, Jimmy K Eng, Eric W Deutsch, James S Eddes, Stephen E Stein, Ruedi Aebersold
    Abstract:

    Spectral searching, based on matching experimental peptide spectra to reference spectral libraries, is gaining interest as an alternative to traditional Sequence-Database searching in mass spectrometry–based proteomics. A software tool, SpectraST, now allows users to build their own high-quality spectral libraries from raw data.

  • probid a probabilistic algorithm to identify peptides through Sequence Database searching using tandem mass spectral data
    Proteomics, 2002
    Co-Authors: Ning Zhang, Ruedi Aebersold, Benno Schwikowski
    Abstract:

    With the recent quick expansion of DNA and protein Sequence Databases, intensive efforts are underway to interpret the linear genetic information of DNA in terms of function, structure, and control of biological processes. The systematic identification and quantification of expressed proteins has proven particularly powerful in this regard. Large-scale protein identification is usually achieved by automated liquid chromatography-tandem mass spectrometry of complex peptide mixtures and Sequence Database searching of the resulting spectra [Aebersold and Goodlett, Chem. Rev. 2001, 101, 269-295]. As generating large numbers of Sequence-specific mass spectra (collision-induced dissociation/CID) spectra has become a routine operation, research has shifted from the generation of Sequence Database search results to their validation. Here we describe in detail a novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide Sequence in a Database. We document the performance of the algorithm on a reference data set and in comparison with another Sequence Database search tool. The software is publicly available for use and evaluation at http://www.systemsbiology.org/research/ software/proteomics/ProbID.

  • protein analysis by mass spectrometry and Sequence Database searching tools for cancer research in the post genomic era
    Electrophoresis, 1999
    Co-Authors: Steven P Gygi, Anneclaude Gingras, Nahum Sonenberg, Ruedi Aebersold
    Abstract:

    The post-genomic era is characterized by the deposition of Sequence information for entire genomes in Databases. Currently, besides the protein Sequences for known human proteins, there are partial Sequences from thousands more human proteins for which no biological function has been assigned. A powerful new tool for the unambiguous identification and characterization of gel-separated proteins is accomplished by the combination of mass spectrometry and Sequence Database searching. This combination provides the cancer biologist with the ability to (i) identify the potential protein:protein associations and (ii) fully characterize function-critical post-translational modifications, both directly from silver-stained polyacrylamide gels. In this report we describe the application of tandem mass spectrometry and Database searching to two problems which are prototypical for cancer research and indeed for biomedical research in general. The first is the identification of gel-separated, low abundance proteins based on amino acid Sequence composition following coimmunoprecipitation with the human apoptosis inhibitor protein BclXL. The second is the determination of the precise sites of phosphorylation of the human regulatory protein 4E-BP1, which controls mRNA translation.

Henry H N Lam - One of the best experts on this subject based on the ideXlab platform.

  • fast parallel tandem mass spectral library searching using gpu hardware acceleration
    Journal of Proteome Research, 2011
    Co-Authors: Lydia Ashleigh Baumgardner, Jimmy K Eng, Henry H N Lam, Avinash Kumar Shanmugam, Daniel Martin
    Abstract:

    Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its Sequence, traditionally accomplished by Sequence Database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or Sequence Database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching) is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.

  • understanding the improved sensitivity of spectral library searching over Sequence Database searching in proteomics data analysis
    Proteomics, 2011
    Co-Authors: Xin Zhang, Wenguang Shao, Henry H N Lam
    Abstract:

    Spectral library searching has been recently proposed as an alternative to Sequence Database searching for peptide identification from MS/MS. We performed a systematic comparison between spectral library searching and Sequence Database searching using a wide variety of data to better demonstrate, and understand, the superior sensitivity of the former observed in preliminary studies. By decoupling the effect of search space, we demonstrated that the success of spectral library searching is primarily attributable to the use of real library spectra for matching, without which the sensitivity advantage largely disappears. We further determined the extent to which the use of real peak intensities and non-canonical fragments, both under-utilized information in Sequence Database searching, contributes to the sensitivity advantage. Lastly, we showed that spectral library searching is disproportionately more successful in identifying low-quality spectra, and complex spectra of higher- charged precursors, both important frontiers in peptide sequencing. Our results answered important outstanding questions about this promising yet unproven method using well-controlled computational experiments and sound statistical approaches.

  • building consensus spectral libraries for peptide identification in proteomics
    Nature Methods, 2008
    Co-Authors: Henry H N Lam, Jimmy K Eng, Eric W Deutsch, James S Eddes, Stephen E Stein, Ruedi Aebersold
    Abstract:

    Spectral searching, based on matching experimental peptide spectra to reference spectral libraries, is gaining interest as an alternative to traditional Sequence-Database searching in mass spectrometry–based proteomics. A software tool, SpectraST, now allows users to build their own high-quality spectral libraries from raw data.

Timothy J Griffin - One of the best experts on this subject based on the ideXlab platform.

  • a sectioning and Database enrichment approach for improved peptide spectrum matching in large genome guided protein Sequence Databases
    Journal of Proteome Research, 2020
    Co-Authors: Praveen Kumar, Brook L Nunn, James E Johnson, Caleb Easterly, Subina Mehta, Ray Sajulga, Pratik D Jagtap, Timothy J Griffin
    Abstract:

    Multiomics approaches focused on mass spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein Sequence Database. These Databases can be very large, containing millions of Sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to Sequences to generate peptide spectrum matches (PSMs). Here, we describe and evaluate a sectioning method for generating an enriched Database for those protein Sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics-offering a flexible alternative to traditional large Database searching, as well as previously described two-step Database searching methods for large Sequence Database applications. Furthermore, implementation in the Galaxy platform provides access to an automated and customizable workflow for carrying out the method. Additionally, the results of this study provide valuable insights into the advantages and limitations offered by available methods aimed at addressing challenges of genome-guided, large Database applications in proteomics. Relevant raw data has been made available at https://zenodo.org/ using data set identifier "3754789" and https://arcticdata.io/catalog using data set identifier "A2VX06340".

  • a sectioning and Database enrichment approach for improved peptide spectrum matching in large genome guided protein Sequence Databases
    bioRxiv, 2019
    Co-Authors: Praveen Kumar, Brook L Nunn, James E Johnson, Caleb Easterly, Subina Mehta, Ray Sajulga, Pratik D Jagtap, Timothy J Griffin
    Abstract:

    Multi-omics approaches focused on mass-spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein Sequence Database. These Databases can be very large, containing millions of Sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to Sequences to generate peptide spectrum matches (PSMs). Here, we describe a sectioning method for generating an enriched Database for those protein Sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics. We demonstrate increased true positive PSM identifications using the sectioning method when compared to the traditional large Database searching method, whereas it helped in reducing the false PSM identifications when compared to a previously described two-step method for reducing Database size. The sectioning method for large Sequence Databases enables generation of an enriched protein Sequence Database and promotes increased sensitivity in identifying PSMs, while maintaining acceptable and manageable FDR. Furthermore, implementation in the Galaxy platform provides access to a usable and automated workflow for carrying out the method. Our results show the utility of this methodology for a wide-range of applications where genome-guided, large Sequence Databases are required for MS-based proteomics data analysis.

  • evaluating preparative isoelectric focusing of complex peptide mixtures for tandem mass spectrometry based proteomics a case study in profiling chromatin enriched subcellular fractions in saccharomyces cerevisiae
    Analytical Chemistry, 2005
    Co-Authors: Hongwei Xie, Sricharan Bandhakavi, Timothy J Griffin
    Abstract:

    We have evaluated the use of free-flow electrophoresis, an emerging separation method for preparative isoelectric focusing of complex peptide mixtures, as a tool for high-throughput tandem mass spectrometry-based proteomic analysis. In this study, we investigated the ability of free-flow electrophoresis to resolve and fractionate complex peptide mixtures and also the effectiveness of using peptide isoelectric point in conjunction with peptide match probability scoring in Sequence Database searching. As a model system for this study, we analyzed a chromatin-enriched fraction from the yeast Saccharomyces cerevisiae. This mixture was fractionated using preparative isoelectric focusing by free-flow electrophoresis, followed by online capillary liquid chromatography electrospray tandem mass spectrometry and Sequence Database searching. Our results demonstrate that (1) FFE effectively resolves and fractionates complex peptide mixtures on the basis of peptide isoelectric point and (2) the introduction of peptide pI is effective in minimizing both false positive and false negative Sequence matches in Sequence Database searching of tandem mass spectrometry data.