Database Searches

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 40149 Experts worldwide ranked by ideXlab platform

Torbjorn Rognes - One of the best experts on this subject based on the ideXlab platform.

  • faster smith waterman Database Searches with inter sequence simd parallelisation
    2011
    Co-Authors: Torbjorn Rognes
    Abstract:

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for Database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different Database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman Database Searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman Searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  • paralign a parallel sequence alignment algorithm for rapid and sensitive Database Searches
    2001
    Co-Authors: Torbjorn Rognes
    Abstract:

    There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting Database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online Searches are available at http://dna.uio.no/search/

  • six fold speed up of smith waterman sequence Database Searches using parallel processing on common microprocessors
    2000
    Co-Authors: Torbjorn Rognes, Erling Seeberg
    Abstract:

    Motivation: Sequence Database searching is among the most important and challenging tasks in bioinformatics. The ultimate choice of sequence-search algorithm is that of Smith–Waterman. However, because of the computationally demanding nature of this method, heuristic programs or special-purpose hardware alternatives have been developed. Increased speed has been obtained at the cost of reduced sensitivity or very expensive hardware. Results: A fast implementation of the Smith–Waterman sequence-alignment algorithm using Single-Instruction, Multiple-Data (SIMD) technology is presented. This implementation is based on the MultiMedia eXtensions (MMX) and Streaming SIMD Extensions (SSE) technology that is embedded in Intel’s latest microprocessors. Similar technology exists also in other modern microprocessors. Six-fold speed-up relative to the fastest previously known Smith–Waterman implementation on the same hardware was achieved by an optimized 8-way parallel processing approach. A speed of more than 150 million cell updates per second was obtained on a single Intel Pentium III 500 MHz microprocessor. This is probably the fastest implementation of this algorithm on a single general-purpose microprocessor described to date. Availability: Online Searches with the software are available at http:// dna.uio.no/ search/

Pavel A Pevzner - One of the best experts on this subject based on the ideXlab platform.

  • gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra
    2011
    Co-Authors: Kyowon Jeong, Sangtae Kim, Nuno Bandeira, Pavel A Pevzner
    Abstract:

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the Database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS Searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as Searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS Database Searches.

  • gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra
    2010
    Co-Authors: Kyowon Jeong, Sangtae Kim, Nuno Bandeira, Pavel A Pevzner
    Abstract:

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the Database represent a recently emerged alternative approach to peptide identification However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the shortcoming of the Spectral Dictionary approach We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS Database Searches Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications that are prohibitively time consuming with existing approaches We further introduce gapped tags that have advantages over the conventional peptide sequence tags in filtration-based MS/MS Database Searches.

  • peptide sequence tags for fast Database search in mass spectrometry
    2005
    Co-Authors: Ari Frank, Stephen Tanner, Vineet Bafna, Pavel A Pevzner
    Abstract:

    Filtration techniques in the form of rapid elimination of candidate sequences while retaining the true one are key ingredients of Database Searches in genomics. Although SEQUEST and Mascot perform a conceptually similar task to the tool BLAST, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that "genome vs genome" comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS Database Searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm. Our tag generating algorithm along with our de novo sequencing algorithm PepNovo can be accessed via the URL http://peptide.ucsd.edu/.

Andrej Shevchenko - One of the best experts on this subject based on the ideXlab platform.

  • simplified validation of borderline hits of Database Searches
    2008
    Co-Authors: Henrik Thomas, Andrej Shevchenko
    Abstract:

    Along with unequivocal hits produced by matching multiple MS/MS spectra to Database sequences, LC-MS/MS analysis often yields a large number of hits of borderline statistical confidence. To simplify their validation, we propose to use rapid de novo interpretation of all acquired MS/MS spectra and, with the help of a simple software tool, display the candidate sequences together with each Database search hit. We demonstrate that comparing hit Database sequences and independent de novo interpretations of the same MS/MS spectra assists in rapid examination of ambiguous matches.

  • separating the wheat from the chaff unbiased filtering of background tandem mass spectra improves protein identification
    2008
    Co-Authors: Magno Junqueira, Henrik Thomas, Victor Spirin, Tiago Santana Balbuena, Patrice Waridel, Vineeth Surendranath, Gregory Kryukov, Ivan Adzhubei, Shamil R Sunyaev, Andrej Shevchenko
    Abstract:

    Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon Database Searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of Database Searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when Searches lacked spectrum to sequence matching specificity. In sequence-similarity Searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent Database Searches and improved the identification of low-abundance proteins.

  • error tolerant est Database Searches by tandem mass spectrometry and multitag software
    2005
    Co-Authors: Adam J Liska, Shamil R Sunyaev, Ignat N Shilov, Dan A Schaeffer, Andrej Shevchenko
    Abstract:

    The MultiTag method (Sunyaev et al., Anal. Chem. 2003 15, 1307-1315) employs multiple error-tolerant Searches with peptide sequence tags (Mann and Wilm, Anal. Chem. 1994, 66, 4390-4399) for the identification of proteins from organisms with unsequenced genomes. Here we demonstrate that the error-tolerant capabilities of MultiTag increased the number of peptide alignments and improved the confidence of identifications in an EST Database. The MultiTag outperformed conventional Database searching software that only utilizes stringent matching of tandem mass spectra to nucleotide sequences of ESTs.

Torbjo Rognes - One of the best experts on this subject based on the ideXlab platform.

Steven P Gygi - One of the best experts on this subject based on the ideXlab platform.

  • a mass tolerant Database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides
    2015
    Co-Authors: Joel M Chick, Deepak Kolippakkam, David P Nusinow, Bo Zhai, Edward L Huttlin, Steven P Gygi
    Abstract:

    In shotgun proteomics experiments, modified peptides account for a large part of the unassigned spectra and can be identified using ultra-tolerant Database Searches.

  • target decoy search strategy for increased confidence in large scale protein identifications by mass spectrometry
    2007
    Co-Authors: Joshua E Elias, Steven P Gygi
    Abstract:

    Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this Database search strategy are reasonable; (ii) concatenated target-decoy Database Searches are preferable to separate target and decoy Database Searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy Databases are similarly effective once certain considerations are taken into account.