Database Searches - Explore the Science & Experts

The Experts below are selected from a list of 40149 Experts worldwide ranked by ideXlab platform

Torbjorn Rognes - One of the best experts on this subject based on the ideXlab platform.

faster smith waterman Database Searches with inter sequence simd parallelisation

2011

Co-Authors: Torbjorn Rognes

Abstract:

The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for Database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different Database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman Database Searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman Searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

15 days free trial to Access Article
paralign a parallel sequence alignment algorithm for rapid and sensitive Database Searches

2001

Co-Authors: Torbjorn Rognes

Abstract:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting Database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online Searches are available at http://dna.uio.no/search/

15 days free trial to Access Article
six fold speed up of smith waterman sequence Database Searches using parallel processing on common microprocessors

2000

Co-Authors: Torbjorn Rognes, Erling Seeberg

Abstract:

Motivation: Sequence Database searching is among the most important and challenging tasks in bioinformatics. The ultimate choice of sequence-search algorithm is that of Smith–Waterman. However, because of the computationally demanding nature of this method, heuristic programs or special-purpose hardware alternatives have been developed. Increased speed has been obtained at the cost of reduced sensitivity or very expensive hardware. Results: A fast implementation of the Smith–Waterman sequence-alignment algorithm using Single-Instruction, Multiple-Data (SIMD) technology is presented. This implementation is based on the MultiMedia eXtensions (MMX) and Streaming SIMD Extensions (SSE) technology that is embedded in Intel’s latest microprocessors. Similar technology exists also in other modern microprocessors. Six-fold speed-up relative to the fastest previously known Smith–Waterman implementation on the same hardware was achieved by an optimized 8-way parallel processing approach. A speed of more than 150 million cell updates per second was obtained on a single Intel Pentium III 500 MHz microprocessor. This is probably the fastest implementation of this algorithm on a single general-purpose microprocessor described to date. Availability: Online Searches with the software are available at http:// dna.uio.no/ search/

15 days free trial to Access Article

Pavel A Pevzner - One of the best experts on this subject based on the ideXlab platform.

gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra

2011

Co-Authors: Kyowon Jeong, Sangtae Kim, Nuno Bandeira, Pavel A Pevzner

Abstract:

Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the Database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS Searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as Searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS Database Searches.

15 days free trial to Access Article
gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra

2010

Co-Authors: Kyowon Jeong, Sangtae Kim, Nuno Bandeira, Pavel A Pevzner

Abstract:

Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the Database represent a recently emerged alternative approach to peptide identification However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the shortcoming of the Spectral Dictionary approach We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS Database Searches Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications that are prohibitively time consuming with existing approaches We further introduce gapped tags that have advantages over the conventional peptide sequence tags in filtration-based MS/MS Database Searches.

15 days free trial to Access Article
peptide sequence tags for fast Database search in mass spectrometry

2005

Co-Authors: Ari Frank, Stephen Tanner, Vineet Bafna, Pavel A Pevzner

Abstract:

Filtration techniques in the form of rapid elimination of candidate sequences while retaining the true one are key ingredients of Database Searches in genomics. Although SEQUEST and Mascot perform a conceptually similar task to the tool BLAST, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that "genome vs genome" comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS Database Searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm. Our tag generating algorithm along with our de novo sequencing algorithm PepNovo can be accessed via the URL http://peptide.ucsd.edu/.

15 days free trial to Access Article

Andrej Shevchenko - One of the best experts on this subject based on the ideXlab platform.

simplified validation of borderline hits of Database Searches

2008

Co-Authors: Henrik Thomas, Andrej Shevchenko

Abstract:

Along with unequivocal hits produced by matching multiple MS/MS spectra to Database sequences, LC-MS/MS analysis often yields a large number of hits of borderline statistical confidence. To simplify their validation, we propose to use rapid de novo interpretation of all acquired MS/MS spectra and, with the help of a simple software tool, display the candidate sequences together with each Database search hit. We demonstrate that comparing hit Database sequences and independent de novo interpretations of the same MS/MS spectra assists in rapid examination of ambiguous matches.

15 days free trial to Access Article
separating the wheat from the chaff unbiased filtering of background tandem mass spectra improves protein identification

2008

Co-Authors: Magno Junqueira, Henrik Thomas, Victor Spirin, Tiago Santana Balbuena, Patrice Waridel, Vineeth Surendranath, Gregory Kryukov, Ivan Adzhubei, Shamil R Sunyaev, Andrej Shevchenko

Abstract:

Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon Database Searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of Database Searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when Searches lacked spectrum to sequence matching specificity. In sequence-similarity Searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent Database Searches and improved the identification of low-abundance proteins.

15 days free trial to Access Article
error tolerant est Database Searches by tandem mass spectrometry and multitag software

2005

Co-Authors: Adam J Liska, Shamil R Sunyaev, Ignat N Shilov, Dan A Schaeffer, Andrej Shevchenko

Abstract:

The MultiTag method (Sunyaev et al., Anal. Chem. 2003 15, 1307-1315) employs multiple error-tolerant Searches with peptide sequence tags (Mann and Wilm, Anal. Chem. 1994, 66, 4390-4399) for the identification of proteins from organisms with unsequenced genomes. Here we demonstrate that the error-tolerant capabilities of MultiTag increased the number of peptide alignments and improved the confidence of identifications in an EST Database. The MultiTag outperformed conventional Database searching software that only utilizes stringent matching of tandem mass spectra to nucleotide sequences of ESTs.

15 days free trial to Access Article

Torbjo Rognes - One of the best experts on this subject based on the ideXlab platform.

faster smith waterman Database Searches with inter sequence simd parallelisation

2011

Co-Authors: Torbjo Rognes

Abstract:

Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for Database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation.

15 days free trial to Access Article

Steven P Gygi - One of the best experts on this subject based on the ideXlab platform.

a mass tolerant Database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

2015

Co-Authors: Joel M Chick, Deepak Kolippakkam, David P Nusinow, Bo Zhai, Edward L Huttlin, Steven P Gygi

Abstract:

In shotgun proteomics experiments, modified peptides account for a large part of the unassigned spectra and can be identified using ultra-tolerant Database Searches.

15 days free trial to Access Article
target decoy search strategy for increased confidence in large scale protein identifications by mass spectrometry

2007

Co-Authors: Joshua E Elias, Steven P Gygi

Abstract:

Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this Database search strategy are reasonable; (ii) concatenated target-decoy Database Searches are preferable to separate target and decoy Database Searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy Databases are similarly effective once certain considerations are taken into account.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Database Searches with ideXlab!

Torbjorn Rognes - One of the best experts on this subject based on the ideXlab platform.

faster smith waterman Database Searches with inter sequence simd parallelisation

paralign a parallel sequence alignment algorithm for rapid and sensitive Database Searches

six fold speed up of smith waterman sequence Database Searches using parallel processing on common microprocessors

Pavel A Pevzner - One of the best experts on this subject based on the ideXlab platform.

gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra

gapped spectral dictionaries and their applications for Database Searches of tandem mass spectra

peptide sequence tags for fast Database search in mass spectrometry

Andrej Shevchenko - One of the best experts on this subject based on the ideXlab platform.

simplified validation of borderline hits of Database Searches

separating the wheat from the chaff unbiased filtering of background tandem mass spectra improves protein identification

error tolerant est Database Searches by tandem mass spectrometry and multitag software

Torbjo Rognes - One of the best experts on this subject based on the ideXlab platform.

faster smith waterman Database Searches with inter sequence simd parallelisation

Steven P Gygi - One of the best experts on this subject based on the ideXlab platform.

a mass tolerant Database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

target decoy search strategy for increased confidence in large scale protein identifications by mass spectrometry