Shotgun Proteomics

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7662 Experts worldwide ranked by ideXlab platform

Michael J Maccoss - One of the best experts on this subject based on the ideXlab platform.

  • comparison of data acquisition strategies on quadrupole ion trap instrumentation for Shotgun Proteomics
    Journal of the American Society for Mass Spectrometry, 2014
    Co-Authors: Jesse D. Canterbury, Michael J Maccoss, Gennifer E Merrihew, David R Goodlett, Scott A Shaffer
    Abstract:

    The most common data collection in Shotgun Proteomics is via data-dependent acquisition (DDA), a process driven by an automated instrument control routine that directs MS/MS acquisition from the highest abundant signals to the lowest. An alternative to DDA is data-independent acquisition (DIA), a process in which a specified range in m/z is fragmented without regard to prioritization of a precursor ion or its relative abundance in the mass spectrum, thus potentially offering a more comprehensive analysis of peptides than DDA. In this work, we evaluate both DDA and DIA on three different linear ion trap instruments: an LTQ, an LTQ modified with an electrodynamic ion funnel, and an LTQ Velos. These instruments represent both older (LTQ) and newer (LTQ Velos) ion trap designs (i.e., linear versus dual ion traps, respectively), and allow direct comparison of peptide identifications using both DDA and DIA analysis. Further, as the LTQ Velos has an enhanced “S-lens” ion guide to improve ion flux, we found it logical to determine if the former LTQ model could be leveraged by improving sensitivity by modifying with an electrodynamic ion guide of significantly different design to the S-lens. We find that the ion funnel enabled LTQ identifies more proteins in the insoluble fraction of a yeast lysate than the other two instruments in DIA mode, whereas the faster scanning LTQ Velos performs better in DDA mode. We explore reasons for these results, including differences in scan speed, source ion optics, and linear ion trap design.

  • estimating relative abundances of proteins from Shotgun Proteomics data
    BMC Bioinformatics, 2012
    Co-Authors: Sean Mcilwain, Michael J Maccoss, Michael Mathews, Michael S Bereman, Edwin W Rubel, William Stafford Noble
    Abstract:

    Background: Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using Shotgun Proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SIN), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF). Results: We compared the reproducibility and the linearity relative to each protein’s abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SIN and NSAF achieve the best linearity. Conclusions: With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at http://noble.gs.washington.edu/proj/cruxspectral-counts.

  • high speed data reduction feature detection and ms ms spectrum quality assessment of Shotgun Proteomics data sets using high resolution mass spectrometry
    Analytical Chemistry, 2007
    Co-Authors: Michael R Hoopmann, Gregory L Finney, Michael J Maccoss
    Abstract:

    Advances in Fourier transform mass spectrometry have made the acquisition of high-resolution and accurate mass measurements routine on a chromatographic time scale. Here we report an algorithm, Hardklor, for the rapid and robust analysis of high-resolution mass spectra acquired in Shotgun Proteomics experiments. Our algorithm is demonstrated in the analysis of an Escherichia coli enriched membrane fraction. The mass spectrometry data of the respective peptides are acquired by microcapillary HPLC on an LTQ-orbitrap mass spectrometer with data-dependent acquisition of MS/MS spectra. Hardklor detects 211,272 total peptide isotope distributions over a 2-h analysis (75-min gradient) in only a small fraction of the time required to acquire the data. From these data there are 13 665 distinct, chromatographically persistent peptide isotope distributions. Hardklor is also used to assess the quality of the product ion spectra and finds that more than 11.2% of the MS/MS spectra are composed of fragment ions from mul...

  • high speed data reduction feature detection and ms ms spectrum quality assessment of Shotgun Proteomics data sets using high resolution mass spectrometry
    Analytical Chemistry, 2007
    Co-Authors: Michael R Hoopmann, Gregory L Finney, Michael J Maccoss
    Abstract:

    Advances in Fourier transform mass spectrometry have made the acquisition of high-resolution and accurate mass measurements routine on a chromatographic time scale. Here we report an algorithm, Hardklor, for the rapid and robust analysis of high-resolution mass spectra acquired in Shotgun Proteomics experiments. Our algorithm is demonstrated in the analysis of an Escherichia coli enriched membrane fraction. The mass spectrometry data of the respective peptides are acquired by microcapillary HPLC on an LTQ-orbitrap mass spectrometer with data-dependent acquisition of MS/MS spectra. Hardklor detects 211,272 total peptide isotope distributions over a 2-h analysis (75-min gradient) in only a small fraction of the time required to acquire the data. From these data there are 13 665 distinct, chromatographically persistent peptide isotope distributions. Hardklor is also used to assess the quality of the product ion spectra and finds that more than 11.2% of the MS/MS spectra are composed of fragment ions from mul...

  • Semi-supervised learning for peptide identification from Shotgun Proteomics datasets
    Nature Methods, 2007
    Co-Authors: Lukas Käll, Jesse D. Canterbury, William Stafford Noble, Jason Weston, Michael J Maccoss
    Abstract:

    Shotgun Proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

Lukas Käll - One of the best experts on this subject based on the ideXlab platform.

  • focus on the spectra that matter by clustering of quantification data in Shotgun Proteomics
    Nature Communications, 2020
    Co-Authors: Lukas Käll
    Abstract:

    In Shotgun Proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.

  • integrated identification and quantification error probabilities for Shotgun Proteomics
    Molecular & Cellular Proteomics, 2019
    Co-Authors: Lukas Käll
    Abstract:

    Protein quantification by label-free Shotgun Proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differential proteins use intermediate filters to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered data sets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical data set we discovered 35 proteins at 5% FDR, whereas the original study discovered 1 and MaxQuant/Perseus 4 proteins at this threshold. Compellingly, these 35 proteins showed enrichment for functional annotation terms, whereas the top ranked proteins reported by MaxQuant/Perseus showed no enrichment. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

  • Focus on the spectra that matter by clustering of quantification data in Shotgun Proteomics
    2018
    Co-Authors: Lukas Käll
    Abstract:

    In Shotgun Proteomics, the information extractable from label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow. This prevents valuable information from being discarded prematurely in the identification stage and allows us to spend more effort on the identification process. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. Not only does this eliminate the need for redoing the quantification for each new set of search parameters and engines, but it also reduces search time due to the data reduction by MS2 clustering. For a dataset of partially known composition, we could now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Moreover, Quandenser reports error rates for feature matching, which we integrated into our probabilistic protein quantification method, Triqler. This propagates error probabilities from feature to protein level and appropriately deals with the noise in quantitative signals caused by false positives and missing values. Quandenser+Triqler outperformed the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins at 5% FDR: 123 vs. 117 true positives with 2 vs. 25 false positives in a dataset of partially known composition; 62 vs. 3 proteins in a bladder cancer set; 8 vs. 0 proteins in a hepatic fibrosis set; and 872 vs. 661 proteins in a nanoscale type 1 diabetes set. Compellingly, in all three clinical datasets investigated, the differentially abundant proteins showed enrichment for functional annotation terms. The source code and binary packages for all major operating systems are available from https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.

  • integrated identification and quantification error probabilities for Shotgun Proteomics
    bioRxiv, 2018
    Co-Authors: Lukas Käll
    Abstract:

    Protein quantification by label-free Shotgun Proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered datasets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical dataset we discovered 35 proteins at 5% FDR, with the original study discovering none at this threshold. Compellingly, these proteins showed enrichment for functional annotation terms. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

  • how to talk about protein level false discovery rates in Shotgun Proteomics
    Proteomics, 2016
    Co-Authors: Ayesha Tasnim, Lukas Käll
    Abstract:

    A frequently sought output from a Shotgun Proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard techniqu ...

Mikhail V Gorshkov - One of the best experts on this subject based on the ideXlab platform.

  • scavager a versatile postsearch validation algorithm for Shotgun Proteomics based on gradient boosting
    Proteomics, 2019
    Co-Authors: Lev I Levitsky, Mark V Ivanov, Julia A Bubis, Mikhail V Gorshkov
    Abstract:

    Shotgun Proteomics workflows for database protein identification typically include a combination of search engines and postsearch validation software based mostly on machine learning algorithms. Here, a new postsearch validation tool called Scavager employing CatBoost, an open-source gradient boosting library, which shows improved efficiency compared with the other popular algorithms, such as Percolator, PeptideProphet, and Q-ranker, is presented. The comparison is done using multiple data sets and search engines, including MSGF+, MSFragger, X!Tandem, Comet, and recently introduced IdentiPy. Implemented in Python programming language, Scavager is open-source and freely available at https://bitbucket.org/markmipt/scavager.

  • identipy an extensible search engine for protein identification in Shotgun Proteomics
    Journal of Proteome Research, 2018
    Co-Authors: Anna A Lobas, Marina L Pridatchenko, Mark V Ivanov, Lev I Levitsky, Julia A Bubis, Irina A Tarasova, Elizaveta M Solovyeva, Mikhail V Gorshkov
    Abstract:

    We present an open-source, extensible search engine for Shotgun Proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel “autotune” feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely e...

  • comparative evaluation of label free quantification methods for Shotgun Proteomics
    Rapid Communications in Mass Spectrometry, 2017
    Co-Authors: Mikhail V Gorshkov, Mark V Ivanov, Lev I Levitsky, Julia A Bubis, Irina A Tarasova
    Abstract:

    Rationale Label-free quantification (LFQ) is a popular strategy for Shotgun Proteomics. A variety of LFQ algorithms have been developed recently. However, a comprehensive comparison of the most commonly used LFQ methods is still rare, in part due to a lack of clear metrics for their evaluation and an annotated and quantitatively well-characterized data set. Methods Five LFQ methods were compared: spectral counting based algorithms SIN , emPAI, and NSAF, and approaches relying on the extracted ion chromatogram (XIC) intensities, MaxLFQ and Quanti. We used three criteria for performance evaluation: coefficient of variation (CV) of protein abundances between replicates; analysis of variance (ANOVA); and the root-mean-square error of logarithmized calculated concentration ratios, referred to as standard quantification error (SQE). Comparison was performed using a quantitatively annotated publicly available data set. Results The best results in terms of inter-replicate reproducibility were observed for MaxLFQ and NSAF, although they exhibited larger standard quantification errors. Using NSAF, all quantitatively annotated proteins were correctly identified in the Bonferronni-corrected results of the ANOVA test. SIN was found to be the most accurate in terms of SQE. Finally, the current implementations of XIC-based LFQ methods did not outperform the methods based on spectral counting for the data set used in this study. Conclusions Surprisingly, the performances of XIC-based approaches measured using three independent metrics were found to be comparable with more straightforward and simple MS/MS-based spectral counting approaches. The study revealed no clear leader among the latter. Copyright © 2017 John Wiley & Sons, Ltd.

  • unbiased false discovery rate estimation for Shotgun Proteomics based on the target decoy approach
    Journal of Proteome Research, 2017
    Co-Authors: Anna A Lobas, Mikhail V Gorshkov, Mark V Ivanov, Lev I Levitsky
    Abstract:

    Target-decoy approach (TDA) is the dominant strategy for false discovery rate (FDR) estimation in mass-spectrometry-based Proteomics. One of its main applications is direct FDR estimation based on counting of decoy matches above a certain score threshold. The corresponding equations are widely employed for filtering of peptide or protein identifications. In this work we consider a probability model describing the filtering process and find that, when decoy counting is used for q value estimation and subsequent filtering, a correction has to be introduced into these common equations for TDA-based FDR estimation. We also discuss the scale of variance of false discovery proportion (FDP) and propose using confidence intervals for more conservative FDP estimation in Shotgun Proteomics. The necessity of both the correction and the use of confidence intervals is especially pronounced when filtering small sets (such as in proteogenomics experiments) and when using very low FDR thresholds.

  • empirical multidimensional space for scoring peptide spectrum matches in Shotgun Proteomics
    Journal of Proteome Research, 2014
    Co-Authors: Anna A Lobas, Tanja Panic, Unige A Laskay, Goran Mitulovic, Rainer Schmid, Marina L Pridatchenko, Yury O Tsybin, Mikhail V Gorshkov
    Abstract:

    Data-dependent tandem mass spectrometry (MS/MS) is one of the main techniques for protein identification in Shotgun Proteomics. In a typical LC–MS/MS workflow, peptide product ion mass spectra (MS/MS spectra) are compared with those derived theoretically from a protein sequence database. Scoring of these matches results in peptide identifications. A set of peptide identifications is characterized by false discovery rate (FDR), which determines the fraction of false identifications in the set. The total number of peptides targeted for fragmentation is in the range of 10 000 to 20 000 for a several-hour LC–MS/MS run. Typically, <50% of these MS/MS spectra result in peptide-spectrum matches (PSMs). A small fraction of PSMs pass the preset FDR level (commonly 1%) giving a list of identified proteins, yet a large number of correct PSMs corresponding to the peptides originally present in the sample are left behind in the “grey area” below the identity threshold. Following the numerous efforts to recover these c...

Michael P Washburn - One of the best experts on this subject based on the ideXlab platform.

  • advances in Shotgun Proteomics and the analysis of membrane proteomes
    Journal of Proteomics, 2010
    Co-Authors: Joshua M Gilmore, Michael P Washburn
    Abstract:

    The emergence of Shotgun Proteomics has facilitated the numerous biological discoveries made by proteomic studies. However, comprehensive proteomic analysis remains challenging and Shotgun Proteomics is a continually changing field. This review details the recent developments in Shotgun Proteomics and describes emerging technologies that will influence Shotgun Proteomics going forward. In addition, proteomic studies of integral membrane proteins remain challenging due to the hydrophobic nature in integral membrane proteins and their general low abundance levels. However, there have been many strategies developed for enriching, isolating and separating membrane proteins for proteomic analysis that have moved this field forward. In summary, while Shotgun Proteomics is a widely used and mature technology, the continued pace of improvements in mass spectrometry and proteomic technology and methods indicate that future studies will have an even greater impact on biological discovery.

  • statistical similarities between transcriptomics and quantitative Shotgun Proteomics data
    Molecular & Cellular Proteomics, 2008
    Co-Authors: Norman Pavelka, Marjorie Fournier, Laurence Florens, Selene K Swanson, Mattia Pelizzola, Paola Ricciardicastagnoli, Michael P Washburn
    Abstract:

    If the large collection of microarray-specific statistical tools was applicable to the analysis of quantitative Shotgun Proteomics datasets, it would certainly foster an important advancement of Proteomics research. Here we analyze two large multidimensional protein identification technology datasets, one containing eight replicates of the soluble fraction of a yeast whole-cell lysate and one containing nine replicates of a human immunoprecipitate, to test whether normalized spectral abundance factor (NSAF) values share substantially similar statistical properties with transcript abundance values from Affymetrix GeneChip data. First we show similar dynamic range and distribution properties of these two types of numeric values. Next we show that the standard deviation (S.D.) of a protein's NSAF values was dependent on the average NSAF value of the protein itself, following a power law. This relationship can be modeled by a power law global error model (PLGEM), initially developed to describe the variance-versus-mean dependence that exists in GeneChip data. PLGEM parameters obtained from NSAF datasets proved to be surprisingly similar to the typical parameters observed in GeneChip datasets. The most important common feature identified by this approach was that, although in absolute terms the S.D. of replicated abundance values increases as a function of increasing average abundance, the coefficient of variation, a relative measure of variability, becomes progressively smaller under the same conditions. We next show that PLGEM parameters were reasonably stable to decreasing numbers of replicates. We finally illustrate one possible application of PLGEM in the identification of differentially abundant proteins that might potentially outperform standard statistical tests. In summary, we believe that this body of work lays the foundation for the application of microarray-specific tools in the analysis of NSAF datasets.

  • multidimensional separations based Shotgun Proteomics
    ChemInform, 2007
    Co-Authors: Marjorie Fournier, Joshua M Gilmore, Skylar Martinbrown, Michael P Washburn
    Abstract:

    The proteome refers to the collection of proteins in a given biological organism or system under a particular set of environmental conditions. The study of proteins, referred to as Proteomics, is performed to identify the components of a particular proteome and analyze global changes in protein expression in response to different stimuli. This leads to an understanding of physiological and pathological states of an organism through a comprehensive analysis of biological processes. From a protein complex to whole cell, proteome analysis deals with highly complex mixtures, requiring more than one analytical dimension to achieve the high resolving power necessary for reliable analysis. In 1975, O’Farrell and Klose described two-dimensional polyacrylamide gel electrophoresis (2D-PAGE)1,2 that could resolve complex protein mixtures into thousands of spots. Years later, upon the development of matrix-assisted laser desorption ionization (MALDI)3 and electrospray ionization (ESI),4,5 combined with database searching,6-9 the field of Proteomics began to grow dramatically. Researchers were able to characterize complex mixtures of proteins and gain novel biological insights. Despite the longstanding success of 2D-PAGE coupled with mass spectrometry, several fundamental issues with the technology, including the challenges of identifying low-abundance proteins,10-12 membrane proteins,13 and proteins with extremes in isoelectric point (pI) and molecular weight (MW),14,15 drove researchers to develop alternative approaches for the separation of complex mixtures. This led to the emergence of Shotgun Proteomics based on the coupling of high-performance liquid chromatography (HPLC) and mass spectrometry (MS). Similar to the Shotgun genomic sequencing approach in which DNA is broken into smaller pieces prior to sequencing and reassembled in silico, proteins are first digested into peptides and then analyzed by multidimensional chromatography coupled to tandem mass spectrometry (MS/MS). Thousands of tandem mass spectra are then compared to theoretical tandem mass spectra using database searching algorithms for the identification of proteins in the sample (Figure 1b). The inability of one-dimensional (1D) separation techniques to resolve complex biological samples for Shotgun Proteomics has required the development of multidimensional separation methods. A multidimensional separation includes two or more independent separation techniques (i.e., ion exchange, size exclusion, reversed phase, and affinity) coupled together for the analysis of a single sample. * Address correspondence to this author at the Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110 [telephone (816) 9264457; fax (816) 926-4694; e-mail mpw@stowers-institute.org]. † These authors contributed equally to this work. 3654 Chem. Rev. 2007, 107, 3654−3686

  • quantitative Shotgun Proteomics using a protease with broad specificity and normalized spectral abundance factors
    Molecular BioSystems, 2007
    Co-Authors: Boris Zybailov, Laurence Florens, Michael P Washburn
    Abstract:

    Non-specific proteases are rarely used in quantitative Shotgun Proteomics due to potentially high false discovery rates. Yet, there are instances when application of a non-specific protease is desirable to obtain sufficient sequence coverage of otherwise poorly accessible proteins or structural domains. Using the non-specific protease, proteinase K, we analyzed Saccharomyces cerevisiae preparations grown in 14N rich media and 15N minimal media and obtained relative quantitation from the dataset using normalized spectral abundance factors (NSAFs). A critical step in using a spectral counting based approach for quantitative Proteomics is ensuring the inclusion of high quality spectra in the dataset. One way to do this is to minimize the false discovery rate, which can be accomplished by applying different filters to a searched dataset. Natural log transformation of proteinase K derived NSAF values followed a normal distribution and allowed for statistical analysis by the t-test. Using this approach, we generated a dataset of 719 unique proteins found in each of the three independent biological replicates, of which 84 showed a statistically significant difference in expression levels between the two growth conditions.

  • an automated multidimensional protein identification technology for Shotgun Proteomics
    Analytical Chemistry, 2001
    Co-Authors: Dirk Wolters, Michael P Washburn, John R Yates
    Abstract:

    We describe an automated method for Shotgun Proteomics named multidimensional protein identification technology (MudPIT), which combines multidimensional liquid chromatography with electrospray ionization tandem mass spectrometry. The multidimensional liquid chromatography method integrates a strong cation-exchange (SCX) resin and reversed-phase resin in a biphasic column. We detail the improvements over a system described by Link et al. (Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676−682) that separates and acquires tandem mass spectra for thousands of peptides. Peptides elute off the SCX phase by increasing pI, and elution off the SCX material is evenly distributed across an analysis. In addition, we describe the chromatographic benchmarks of MudPIT. MudPIT was reproducible within 0.5% between two analyses. Furthermore, a dynamic range of 10 000 to 1 between the most abundant and least abundant proteins/pep...

William Stafford Noble - One of the best experts on this subject based on the ideXlab platform.

  • on the importance of well calibrated scores for identifying Shotgun Proteomics spectra
    Journal of Proteome Research, 2015
    Co-Authors: Uri Keich, William Stafford Noble
    Abstract:

    Identifying the peptide responsible for generating an observed fragmentation spectrum requires scoring a collection of candidate peptides and then identifying the peptide that achieves the highest score. However, analysis of a large collection of such spectra requires that the score assigned to one spectrum be well-calibrated with respect to the scores assigned to other spectra. In this work, we define the notion of calibration in the context of Shotgun Proteomics spectrum identification, and we introduce a simple, albeit computationally intensive, technique to calibrate an arbitrary score function. We demonstrate that this calibration procedure yields an increased number of identified spectra at a fixed false discovery rate (FDR) threshold. We also show that proper calibration of scores has a surprising effect on a previously described FDR estimation procedure, making the procedure less conservative. Finally, we provide empirical results suggesting that even partial calibration, which is much less comput...

  • determining the calibration of confidence estimation procedures for unique peptides in Shotgun Proteomics
    Journal of Proteomics, 2013
    Co-Authors: Viktor Granholm, William Stafford Noble, Jose Fernandez Navarro, Lukas Käll
    Abstract:

    The analysis of a Shotgun Proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.

  • estimating relative abundances of proteins from Shotgun Proteomics data
    BMC Bioinformatics, 2012
    Co-Authors: Sean Mcilwain, Michael J Maccoss, Michael Mathews, Michael S Bereman, Edwin W Rubel, William Stafford Noble
    Abstract:

    Background: Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using Shotgun Proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SIN), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF). Results: We compared the reproducibility and the linearity relative to each protein’s abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SIN and NSAF achieve the best linearity. Conclusions: With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at http://noble.gs.washington.edu/proj/cruxspectral-counts.

  • a cross validation scheme for machine learning algorithms in Shotgun Proteomics
    BMC Bioinformatics, 2012
    Co-Authors: Viktor Granholm, William Stafford Noble, Lukas Käll
    Abstract:

    Peptides are routinely identified from mass spectrometry-based Proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for Shotgun Proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • on using samples of known protein content to assess the statistical calibration of scores assigned to peptide spectrum matches in Shotgun Proteomics
    Journal of Proteome Research, 2011
    Co-Authors: Viktor Granholm, William Stafford Noble, Lukas Käll
    Abstract:

    In Shotgun Proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any ...