SEQUEST

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7167 Experts worldwide ranked by ideXlab platform

William Stafford Noble - One of the best experts on this subject based on the ideXlab platform.

  • Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra
    Journal of proteome research, 2011
    Co-Authors: Benjamin J. Diament, William Stafford Noble
    Abstract:

    Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10 000 spectra against a tryptic database of 27 499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares ...

  • Statistical Calibration of the SEQUEST XCorr Function
    Journal of proteome research, 2009
    Co-Authors: Aaron A. Klammer, Christopher Y. Park, William Stafford Noble
    Abstract:

    Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct pe...

  • A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores.
    Journal of proteome research, 2003
    Co-Authors: D. C. Anderson, And Donald G. Payan, William Stafford Noble
    Abstract:

    Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative pepti...

David K. Han - One of the best experts on this subject based on the ideXlab platform.

  • Protein identification using Sorcerer 2 and SEQUEST.
    Current protocols in bioinformatics, 2009
    Co-Authors: Deborah H. Lundgren, Michael E. Wright, Harryl Martinez, David K. Han
    Abstract:

    Sage-N's Sorcerer 2 provides an integrated data analysis system for comprehensive protein identification and characterization. It runs on a proprietary version of SEQUEST(R), the most widely used search engine for identifying proteins in complex mixtures. The protocol presented here describes the basic steps performed to process mass spectrometric data with Sorcerer 2 and how to analyze results using TPP and Scaffold. The unit also provides an overview of the SEQUEST(R) algorithm, along with Sorcerer-SEQUEST(R) enhancements, and a discussion of data filtering methods, important considerations in data interpretation, and additional resources that can be of assistance to users running Sorcerer and interpreting SEQUEST(R) results.

  • Current Protocols in Bioinformatics - Protein identification using Sorcerer 2 and SEQUEST.
    Current Protocols in Bioinformatics, 2009
    Co-Authors: Deborah H. Lundgren, Harryl D. Martinez, Michael E. Wright, David K. Han
    Abstract:

    Sage-N's Sorcerer 2 provides an integrated data analysis system for comprehensive protein identification and characterization. It runs on a proprietary version of SEQUEST(R), the most widely used search engine for identifying proteins in complex mixtures. The protocol presented here describes the basic steps performed to process mass spectrometric data with Sorcerer 2 and how to analyze results using TPP and Scaffold. The unit also provides an overview of the SEQUEST(R) algorithm, along with Sorcerer-SEQUEST(R) enhancements, and a discussion of data filtering methods, important considerations in data interpretation, and additional resources that can be of assistance to users running Sorcerer and interpreting SEQUEST(R) results.

  • Current Protocols in Bioinformatics - Protein identification using TurboSEQUEST.
    Current Protocols in Bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results. Keywords: bioinformatics; CID; MS/MS; peptide ion fragmentation; posttranslational modifications; protein identification; proteomics; SEQUEST; tandem mass spectrometry

  • Protein identification using TurboSEQUEST.
    Current protocols in bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results.

John R. Yates - One of the best experts on this subject based on the ideXlab platform.

  • Probability-based validation of protein identifications using a modified SEQUEST algorithm.
    Analytical chemistry, 2002
    Co-Authors: Michael J. Maccoss, John R. Yates
    Abstract:

    Database-searching algorithms compatible with shotgun proteomics match a peptide tandem mass spectrum to a predicted mass spectrum for an amino acid sequence within a database. SEQUEST is one of the most common software algorithms used for the analysis of peptide tandem mass spectra by using a cross-correlation (XCorr) scoring routine to match tandem mass spectra to model spectra derived from peptide sequences. To assess a match, SEQUEST uses the difference between the first- and second-ranked sequences (ΔCn). This value is dependent on the database size, search parameters, and sequence homologies. In this report, we demonstrate the use of a scoring routine (SEQUEST-NORM) that normalizes XCorr values to be independent of peptide size and the database used to perform the search. This new scoring routine is used to objectively calculate the percent confidence of protein identifications and posttranslational modifications based solely on the XCorr value.

  • Code developments to improve the efficiency of automated MS/MS spectra interpretation.
    Journal of proteome research, 2002
    Co-Authors: Rovshan G. Sadygov, Jimmy K. Eng, Michael J. Maccoss, Eberhard Durr, Anita Saraf, Hayes Mcdonald, John R. Yates
    Abstract:

    We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a com...

  • Protein Identification by SEQUEST
    Proteome Research: Mass Spectrometry, 2001
    Co-Authors: David L. Tabb, Jimmy K. Eng, John R. Yates
    Abstract:

    Sequencing peptides was an enormously cumbersome process until the development of tandem mass spectrometry (MS/MS). The main barrier to broad use of this technique has been the difficulty of spectrum interpretation. The SEQUEST software package attempts to ease this process by automatically matching tandem mass spectra to database sequences (Eng et al. 1994). Candidate sequences are culled from the database by several simple filters. Virtual spectra are constructed for these sequences, and these are compared to the observed spectrum. Sequences yielding spectra most similar to the observed one are reported to the user.

  • Search of sequence databases with uninterpreted high-energy collision-induced dissociation spectra of peptides
    Journal of the American Society for Mass Spectrometry, 1996
    Co-Authors: John R. Yates, Jimmy K. Eng, Karl R. Clauser, Alma L. Burlingame
    Abstract:

    We have broadened the utility of the SEQUEST computer algorithms to permit correlation of uninterpreted high-energy collision-induced dissociation spectra of peptides with all sequences in a database. SEQUEST now allows for the additional fragment ion types observed under high-energy conditions. We analyzed spectra from peptides isolated following trypsin digestion of 13 proteins. SEQUEST ranked the correct sequence first for 90% (18/20) of the spectra in searches of the OWL database, without constraint by enzyme cleavage specificity or species of origin. All false-positives were flagged by the scoring system. SEQUEST searches databases for sequences that correspond to the precursor ion mass ±0.5 u. Preliminary ranking of the top 500 candidates is done by calculation of fragment ion masses for each sequence, and comparison to the measured ion masses on the basis of ion series continuity, summed ion intensity, and immonium ion presence. Final ranking is done by construction of model spectra for the 500 candidates and constructing/performing of a cross-correlation analysis with the actual spectrum. Given the need to relate mounting genome sequence information with corresponding suites of proteins that comprise the cellular molecular machinery, tandem mass spectrometry appears destined to play the leading role in accelerating protein identification on the large scale required.

Jimmy K. Eng - One of the best experts on this subject based on the ideXlab platform.

  • A fast SEQUEST cross correlation algorithm.
    Journal of proteome research, 2008
    Co-Authors: Jimmy K. Eng, Bernd Fischer, Jonas Grossmann, Michael J. Maccoss
    Abstract:

    The SEQUEST program was the first and remains one of the most widely used tools for assigning a peptide sequence within a database to a tandem mass spectrum. The cross correlation score is the primary score function implemented within SEQUEST and it is this score that makes the tool particularly sensitive. Unfortunately, this score is computationally expensive to calculate, and thus, to make the score manageable, SEQUEST uses a less sensitive but fast preliminary score and restricts the cross correlation to just the top 500 peptides returned by the preliminary score. Classically, the cross correlation score has been calculated using Fast Fourier Transforms (FFT) to generate the full correlation function. We describe an alternate method of calculating the cross correlation score that does not require FFTs and can be computed efficiently in a fraction of the time. The fast calculation allows all candidate peptides to be scored by the cross correlation function, potentially mitigating the need for the preliminary score, and enables an E-value significance calculation based on the cross correlation score distribution calculated on all candidate peptide sequences obtained from a sequence database.

  • Current Protocols in Bioinformatics - Protein identification using TurboSEQUEST.
    Current Protocols in Bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results. Keywords: bioinformatics; CID; MS/MS; peptide ion fragmentation; posttranslational modifications; protein identification; proteomics; SEQUEST; tandem mass spectrometry

  • Protein identification using TurboSEQUEST.
    Current protocols in bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results.

  • Code developments to improve the efficiency of automated MS/MS spectra interpretation.
    Journal of proteome research, 2002
    Co-Authors: Rovshan G. Sadygov, Jimmy K. Eng, Michael J. Maccoss, Eberhard Durr, Anita Saraf, Hayes Mcdonald, John R. Yates
    Abstract:

    We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a com...

  • Protein Identification by SEQUEST
    Proteome Research: Mass Spectrometry, 2001
    Co-Authors: David L. Tabb, Jimmy K. Eng, John R. Yates
    Abstract:

    Sequencing peptides was an enormously cumbersome process until the development of tandem mass spectrometry (MS/MS). The main barrier to broad use of this technique has been the difficulty of spectrum interpretation. The SEQUEST software package attempts to ease this process by automatically matching tandem mass spectra to database sequences (Eng et al. 1994). Candidate sequences are culled from the database by several simple filters. Virtual spectra are constructed for these sequences, and these are compared to the observed spectrum. Sequences yielding spectra most similar to the observed one are reported to the user.

Deborah H. Lundgren - One of the best experts on this subject based on the ideXlab platform.

  • Protein identification using Sorcerer 2 and SEQUEST.
    Current protocols in bioinformatics, 2009
    Co-Authors: Deborah H. Lundgren, Michael E. Wright, Harryl Martinez, David K. Han
    Abstract:

    Sage-N's Sorcerer 2 provides an integrated data analysis system for comprehensive protein identification and characterization. It runs on a proprietary version of SEQUEST(R), the most widely used search engine for identifying proteins in complex mixtures. The protocol presented here describes the basic steps performed to process mass spectrometric data with Sorcerer 2 and how to analyze results using TPP and Scaffold. The unit also provides an overview of the SEQUEST(R) algorithm, along with Sorcerer-SEQUEST(R) enhancements, and a discussion of data filtering methods, important considerations in data interpretation, and additional resources that can be of assistance to users running Sorcerer and interpreting SEQUEST(R) results.

  • Current Protocols in Bioinformatics - Protein identification using Sorcerer 2 and SEQUEST.
    Current Protocols in Bioinformatics, 2009
    Co-Authors: Deborah H. Lundgren, Harryl D. Martinez, Michael E. Wright, David K. Han
    Abstract:

    Sage-N's Sorcerer 2 provides an integrated data analysis system for comprehensive protein identification and characterization. It runs on a proprietary version of SEQUEST(R), the most widely used search engine for identifying proteins in complex mixtures. The protocol presented here describes the basic steps performed to process mass spectrometric data with Sorcerer 2 and how to analyze results using TPP and Scaffold. The unit also provides an overview of the SEQUEST(R) algorithm, along with Sorcerer-SEQUEST(R) enhancements, and a discussion of data filtering methods, important considerations in data interpretation, and additional resources that can be of assistance to users running Sorcerer and interpreting SEQUEST(R) results.

  • Current Protocols in Bioinformatics - Protein identification using TurboSEQUEST.
    Current Protocols in Bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results. Keywords: bioinformatics; CID; MS/MS; peptide ion fragmentation; posttranslational modifications; protein identification; proteomics; SEQUEST; tandem mass spectrometry

  • Protein identification using TurboSEQUEST.
    Current protocols in bioinformatics, 2005
    Co-Authors: Deborah H. Lundgren, David K. Han, Jimmy K. Eng
    Abstract:

    SEQUEST is the most widely used software tool for identifying proteins in complex mixtures. It is a mature, robust program that identifies peptides directly from uninterpreted tandem mass spectra, thus making large-scale proteomic studies possible. Thermo Electron's TurboSEQUEST provides a Windows-based graphical user interface for running SEQUEST and interpreting results. The protocol in this unit describes the basic steps involved in processing mass spectrometric data and analyzing results using TurboSEQUEST. It also provides an overview of the SEQUEST algorithm and a discussion of data filtering methods, critical issues in data interpretation, and available resources that can facilitate proper interpretation of SEQUEST results.