Speedup Technique

The Experts below are selected from a list of 9471 Experts worldwide ranked by ideXlab platform

Sanguthevar Rajasekaran - One of the best experts on this subject based on the ideXlab platform.

an elegant algorithm for the construction of suffix arrays

Journal of Discrete Algorithms, 2014

Co-Authors: Sanguthevar Rajasekaran, Marius Nicolae

Abstract:

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of O(n^2). The problem of designing practically and theoretically efficient Techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the @?-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of O(nlogn). RadixSA introduces an idea that may find independent applications as a Speedup Technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zip.

15 days free trial to Access Article
an elegant algorithm for the construction of suffix arrays

arXiv: Data Structures and Algorithms, 2013

Co-Authors: Sanguthevar Rajasekaran, Marius Nicolae

Abstract:

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of $O(n^2)$. The problem of designing practically and theoretically efficient Techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the $\ell$-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of $O(n\log{n})$. RadixSA introduces an idea that may find independent applications as a Speedup Technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at this http URL

15 days free trial to Access Article
A Speedup Technique for (l, d)-motif finding algorithms

BMC Research Notes, 2011

Co-Authors: Sanguthevar Rajasekaran, Hieu Dinh

Abstract:

Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search ( SMS ), ( l, d ) -motif search (or Planted Motif Search (PMS)) , and Edit-distance-based Motif Search (EMS) . In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate . An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic Technique that can be used to Speedup PMS algorithms. Conclusions We present a Speedup Technique that can be used on any PMS algorithm. We have tested our Speedup Technique on a number of algorithms. These experimental results show that our Speedup Technique is indeed very effective. The implementation of algorithms is freely available on the web at http://www.engr.uconn.edu/rajasek/PMS4.zip

15 days free trial to Access Article
A Speedup Technique for (l, d)-motif finding algorithms

BMC research notes, 2011

Co-Authors: Sanguthevar Rajasekaran, Hieu Dinh

Abstract:

Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem.

15 days free trial to Access Article

Dominik Schultes - One of the best experts on this subject based on the ideXlab platform.

highway hierarchies hasten exact shortest path queries

European Symposium on Algorithms, 2005

Co-Authors: Pete Sanders, Dominik Schultes

Abstract:

We present a new Speedup Technique for route planning that exploits the hierarchy inherent in real world road networks. Our algorithm preprocesses the eight digit number of nodes needed for maps of the USA or Western Europe in a few hours using linear space. Shortest (i.e. fastest) path queries then take around eight milliseconds to produce exact shortest paths. This is about 2 000 times faster than using Dijkstra’s algorithm.

15 days free trial to Access Article
ESA - Highway hierarchies hasten exact shortest path queries

Algorithms – ESA 2005, 2005

Co-Authors: Peter Sanders, Dominik Schultes

Abstract:

We present a new Speedup Technique for route planning that exploits the hierarchy inherent in real world road networks. Our algorithm preprocesses the eight digit number of nodes needed for maps of the USA or Western Europe in a few hours using linear space. Shortest (i.e. fastest) path queries then take around eight milliseconds to produce exact shortest paths. This is about 2 000 times faster than using Dijkstra’s algorithm.

15 days free trial to Access Article

Joshua Goodman - One of the best experts on this subject based on the ideXlab platform.

Classes for Fast Maximum Entropy Training

arXiv: Computation and Language, 2001

Co-Authors: Joshua Goodman

Abstract:

Maximum entropy models are considered by many to be one of the most promising avenues of language modeling research. Unfortunately, long training times make maximum entropy research difficult. We present a novel Speedup Technique: we change the form of the model to use classes. Our Speedup works by creating two maximum entropy models, the first of which predicts the class of each word, and the second of which predicts the word itself. This factoring of the model leads to fewer non-zero indicator functions, and faster normalization, achieving Speedups of up to a factor of 35 over one of the best previous Techniques. It also results in typically slightly lower perplexities. The same trick can be used to speed training of other machine learning Techniques, e.g. neural networks, applied to any problem with a large number of outputs, such as language modeling.

15 days free trial to Access Article
ICASSP - Classes for fast maximum entropy training

2001 IEEE International Conference on Acoustics Speech and Signal Processing. Proceedings (Cat. No.01CH37221), 1

Co-Authors: Joshua Goodman

Abstract:

Maximum entropy models are considered by many to be one of the most promising avenues of language modeling research. Unfortunately, long training times make maximum entropy research difficult. We present a Speedup Technique: we change the form of the model to use classes. Our Speedup works by creating two maximum entropy models, the first of which predicts the class of each word, and the second of which predicts the word itself. This factoring of the model leads to fewer nonzero indicator functions, and faster normalization, achieving Speedups of up to a factor of 35 over one of the best previous Techniques. It also results in typically slightly lower perplexities. The same trick can be used to speed training of other machine learning Techniques, e.g. neural networks, applied to any problem with a large number of outputs, such as language modeling.

15 days free trial to Access Article

Hieu Dinh - One of the best experts on this subject based on the ideXlab platform.

A Speedup Technique for (l, d)-motif finding algorithms

BMC Research Notes, 2011

Co-Authors: Sanguthevar Rajasekaran, Hieu Dinh

Abstract:

Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search ( SMS ), ( l, d ) -motif search (or Planted Motif Search (PMS)) , and Edit-distance-based Motif Search (EMS) . In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate . An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic Technique that can be used to Speedup PMS algorithms. Conclusions We present a Speedup Technique that can be used on any PMS algorithm. We have tested our Speedup Technique on a number of algorithms. These experimental results show that our Speedup Technique is indeed very effective. The implementation of algorithms is freely available on the web at http://www.engr.uconn.edu/rajasek/PMS4.zip

15 days free trial to Access Article
A Speedup Technique for (l, d)-motif finding algorithms

BMC research notes, 2011

Co-Authors: Sanguthevar Rajasekaran, Hieu Dinh

Abstract:

Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem.

15 days free trial to Access Article

Pete Sanders - One of the best experts on this subject based on the ideXlab platform.

highway hierarchies hasten exact shortest path queries

European Symposium on Algorithms, 2005

Co-Authors: Pete Sanders, Dominik Schultes

Abstract:

We present a new Speedup Technique for route planning that exploits the hierarchy inherent in real world road networks. Our algorithm preprocesses the eight digit number of nodes needed for maps of the USA or Western Europe in a few hours using linear space. Shortest (i.e. fastest) path queries then take around eight milliseconds to produce exact shortest paths. This is about 2 000 times faster than using Dijkstra’s algorithm.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Sanguthevar Rajasekaran - One of the best experts on this subject based on the ideXlab platform.

an elegant algorithm for the construction of suffix arrays

an elegant algorithm for the construction of suffix arrays

A Speedup Technique for (l, d)-motif finding algorithms

A Speedup Technique for (l, d)-motif finding algorithms

Dominik Schultes - One of the best experts on this subject based on the ideXlab platform.

highway hierarchies hasten exact shortest path queries

ESA - Highway hierarchies hasten exact shortest path queries

Joshua Goodman - One of the best experts on this subject based on the ideXlab platform.

Classes for Fast Maximum Entropy Training

ICASSP - Classes for fast maximum entropy training

Hieu Dinh - One of the best experts on this subject based on the ideXlab platform.

A Speedup Technique for (l, d)-motif finding algorithms

A Speedup Technique for (l, d)-motif finding algorithms

Pete Sanders - One of the best experts on this subject based on the ideXlab platform.

highway hierarchies hasten exact shortest path queries