Hamming Distance

The Experts below are selected from a list of 19419 Experts worldwide ranked by ideXlab platform

Eric Torng - One of the best experts on this subject based on the ideXlab platform.

large scale Hamming Distance query processing

International Conference on Data Engineering, 2011

Co-Authors: Alex X Liu, Ke Shen, Eric Torng

Abstract:

Hamming Distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming Distance range query problems, where the goal is to find all strings in a database that are within a Hamming Distance bound k from a query string. If k is fixed, we have a static Hamming Distance range query problem. If k is part of the input, we have a dynamic Hamming Distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming Distance range query algorithm called HEngines, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming Distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.

15 days free trial to Access Article
ICDE - Large scale Hamming Distance query processing

2011 IEEE 27th International Conference on Data Engineering, 2011

Co-Authors: Alex X Liu, Ke Shen, Eric Torng

Abstract:

Hamming Distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming Distance range query problems, where the goal is to find all strings in a database that are within a Hamming Distance bound k from a query string. If k is fixed, we have a static Hamming Distance range query problem. If k is part of the input, we have a dynamic Hamming Distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming Distance range query algorithm called HEngines, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming Distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.

15 days free trial to Access Article

Benjamin Sach - One of the best experts on this subject based on the ideXlab platform.

tight cell probe bounds for online Hamming Distance computation

Symposium on Discrete Algorithms, 2013

Co-Authors: Raphaël Clifford, Markus Jalsenius, Benjamin Sach

Abstract:

We show tight bounds for online Hamming Distance computation in the cell-probe model with word size w. The task is to output the Hamming Distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Ω(δ/w log n) time on average per output, where δ is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

15 days free trial to Access Article
tight cell probe bounds for online Hamming Distance computation

arXiv: Data Structures and Algorithms, 2012

Co-Authors: Raphaël Clifford, Markus Jalsenius, Benjamin Sach

Abstract:

We show tight bounds for online Hamming Distance computation in the cell-probe model with word size w. The task is to output the Hamming Distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d/w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

15 days free trial to Access Article

Raphaël Clifford - One of the best experts on this subject based on the ideXlab platform.

ICALP - Approximate Hamming Distance in a stream

2016

Co-Authors: Raphaël Clifford, Tatiana Starikovskaya

Abstract:

We consider the problem of computing a (1+epsilon)-approximation of the Hamming Distance between a pattern of length n and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem. We show the following: - If Alice and Bob both share the pattern and Alice has the first half of the stream and Bob the second half, then there is an O(epsilon^{-4}*log^2(n)) bit randomised one-way communication protocol. - If Alice has the pattern, Bob the first half of the stream and Charlie the second half, then there is an O(epsilon^{-2}*sqrt(n)*log(n)) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1 + epsilon)-approximate Hamming Distance which give worst case running time guarantees per arriving symbol. - For binary input alphabets there is an O(epsilon^{-3}*sqrt(n)*log^2(n)) space and O(epsilon^{-2}*log(n)) time streaming (1 + epsilon)-approximate Hamming Distance algorithm. - For general input alphabets there is an O(epsilon^{-5}*sqrt(n)*log^4(n)) space and O(epsilon^{-4}*log^3(n)) time streaming (1 + epsilon)-approximate Hamming Distance algorithm.

15 days free trial to Access Article
approximate Hamming Distance in a stream

International Colloquium on Automata Languages and Programming, 2016

Co-Authors: Raphaël Clifford, Tatiana Starikovskaya

Abstract:

We consider the problem of computing a (1+epsilon)-approximation of the Hamming Distance between a pattern of length n and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem. We show the following: - If Alice and Bob both share the pattern and Alice has the first half of the stream and Bob the second half, then there is an O(epsilon^{-4}*log^2(n)) bit randomised one-way communication protocol. - If Alice has the pattern, Bob the first half of the stream and Charlie the second half, then there is an O(epsilon^{-2}*sqrt(n)*log(n)) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1 + epsilon)-approximate Hamming Distance which give worst case running time guarantees per arriving symbol. - For binary input alphabets there is an O(epsilon^{-3}*sqrt(n)*log^2(n)) space and O(epsilon^{-2}*log(n)) time streaming (1 + epsilon)-approximate Hamming Distance algorithm. - For general input alphabets there is an O(epsilon^{-5}*sqrt(n)*log^4(n)) space and O(epsilon^{-4}*log^3(n)) time streaming (1 + epsilon)-approximate Hamming Distance algorithm.

15 days free trial to Access Article
Approximate Hamming Distance in a stream

arXiv: Data Structures and Algorithms, 2016

Co-Authors: Raphaël Clifford, Tatiana Starikovskaya

Abstract:

We consider the problem of computing a $(1+\epsilon)$-approximation of the Hamming Distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(\epsilon^{-4} \log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(\epsilon^{-2}\sqrt{n}\log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+\epsilon)$-approximate Hamming Distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(\epsilon^{-3} \sqrt{n} \log^{2} n)$ space and $O(\epsilon^{-2} \log{n})$ time streaming $(1+\epsilon)$-approximate Hamming Distance algorithm. (2) For general input alphabets there is an $O(\epsilon^{-5} \sqrt{n} \log^{4} n)$ space and $O(\epsilon^{-4} \log^3 {n})$ time streaming $(1+\epsilon)$-approximate Hamming Distance algorithm.

15 days free trial to Access Article
tight cell probe bounds for online Hamming Distance computation

Symposium on Discrete Algorithms, 2013

Co-Authors: Raphaël Clifford, Markus Jalsenius, Benjamin Sach

Abstract:

We show tight bounds for online Hamming Distance computation in the cell-probe model with word size w. The task is to output the Hamming Distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Ω(δ/w log n) time on average per output, where δ is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

15 days free trial to Access Article
tight cell probe bounds for online Hamming Distance computation

arXiv: Data Structures and Algorithms, 2012

Co-Authors: Raphaël Clifford, Markus Jalsenius, Benjamin Sach

Abstract:

We show tight bounds for online Hamming Distance computation in the cell-probe model with word size w. The task is to output the Hamming Distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d/w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

15 days free trial to Access Article

Alex X Liu - One of the best experts on this subject based on the ideXlab platform.

large scale Hamming Distance query processing

International Conference on Data Engineering, 2011

Co-Authors: Alex X Liu, Ke Shen, Eric Torng

Abstract:

Hamming Distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming Distance range query problems, where the goal is to find all strings in a database that are within a Hamming Distance bound k from a query string. If k is fixed, we have a static Hamming Distance range query problem. If k is part of the input, we have a dynamic Hamming Distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming Distance range query algorithm called HEngines, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming Distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.

15 days free trial to Access Article
ICDE - Large scale Hamming Distance query processing

2011 IEEE 27th International Conference on Data Engineering, 2011

Co-Authors: Alex X Liu, Ke Shen, Eric Torng

Abstract:

Hamming Distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming Distance range query problems, where the goal is to find all strings in a database that are within a Hamming Distance bound k from a query string. If k is fixed, we have a static Hamming Distance range query problem. If k is part of the input, we have a dynamic Hamming Distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming Distance range query algorithm called HEngines, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming Distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.

15 days free trial to Access Article
BCB - NcRNA homology search using Hamming Distance seeds

Proceedings of the 2nd ACM Conference on Bioinformatics Computational Biology and Biomedicine - BCB '11, 2011

Co-Authors: Osama Aljawad, Alex X Liu, Yanni Sun, Jikai Lei

Abstract:

NcRNAs play important roles in many biological processes. Existing genome-scale ncRNA homology search tools identify ncRNAs in local sequence alignments generated by conventional sequence comparison methods. However, some types of ncRNA lack strong sequence conservation and tend to be missed by conventional sequence comparison methods. In this paper, we propose an ncRNA identification framework that is complementary to existing sequence comparison tools. By integrating a filtration step based on Hamming Distance and a local structural alignment program such as FOLDALIGN, we can identify ncRNAs that lack strong sequence conservation. We introduce a coding method by which the Hamming-Distance based filtration can easily distinguish transition from transversion, which show different frequency in functional ncRNAs. Our experiments demonstrate that the carefully designed Hamming Distance seed can achieve better sensitivity in searching for poorly conserved ncRNAs than conventional sequence comparison tools.

15 days free trial to Access Article

Anca L Ralescu - One of the best experts on this subject based on the ideXlab platform.

Adaptive measures of similarity---fuzzy Hamming Distance---and its applications to pattern recognition problems

2006

Co-Authors: Anca L Ralescu, M Ionescu

Abstract:

Similarity measures are the basis of most of the machine learning and pattern recognition algorithms. The choice of the similarity determines the effectiveness of the algorithm in solving the specific problem. This is why finding a relevant similarity measure is an active area of research in machine learning and pattern recognition. Hamming Distance is a simple and efficient similarity measure, but because it was designed to deal with binary vectors, it can not be applied to many problems that uses real-valued vectors. This thesis build upon and extends a generalization of the Hamming Distance, Fuzzy Hamming Distance, that can operate on real-valued vectors and maintain the same meaning as the Hamming Distance: the number of different elements. To assess the effectiveness of this new measure, FHD is employed in several experiments as basis for a Content Image Retrieval system, a banknote validation system and into a conceptual spaces based, knowledge discovery system.

15 days free trial to Access Article
fuzzy Hamming Distance in a content based image retrieval system

IEEE International Conference on Fuzzy Systems, 2004

Co-Authors: M Ionescu, Anca L Ralescu

Abstract:

The performance of content-based image retrieval (CBIR) systems mainly depends on the image similarity measure that it uses. The fuzzy Hamming Distance (D) is an extension of the Hamming Distance for real-valued vectors. Because the feature space of each image is real-valued, the fuzzy Hamming Distance can be successfully used as an image similarity measure. The current study reports on the results of applying D as a similarity measure between the color histograms of two images. The fuzzy Hamming Distance is suitable for this application because it can take into account not only the number of different colors but also the magnitude of this difference.

15 days free trial to Access Article
FUZZ-IEEE - Fuzzy Hamming Distance in a content-based image retrieval system

2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542), 1

Co-Authors: M Ionescu, Anca L Ralescu

Abstract:

The performance of content-based image retrieval (CBIR) systems mainly depends on the image similarity measure that it uses. The fuzzy Hamming Distance (D) is an extension of the Hamming Distance for real-valued vectors. Because the feature space of each image is real-valued, the fuzzy Hamming Distance can be successfully used as an image similarity measure. The current study reports on the results of applying D as a similarity measure between the color histograms of two images. The fuzzy Hamming Distance is suitable for this application because it can take into account not only the number of different colors but also the magnitude of this difference.

15 days free trial to Access Article
FUZZ-IEEE - Fuzzy Hamming Distance Based Banknote Validator

The 14th IEEE International Conference on Fuzzy Systems 2005. FUZZ '05., 1

Co-Authors: M Ionescu, Anca L Ralescu

Abstract:

Banknote validation systems are used to discriminate between genuine and counterfeit banknotes. The paper proposes a one-class classifier for genuine class using a new similarity measure based on the fuzzy Hamming Distance. For each banknote several regions are considered (corresponding to security features) and each region is split in m times n partitions, to include position information. The feature space used by the classifier consists of color histograms of each partition. The fuzzy Hamming Distance proves to have a good discrimination power being able to completely discriminate between the genuine and counterfeit banknotes

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Eric Torng - One of the best experts on this subject based on the ideXlab platform.

large scale Hamming Distance query processing

ICDE - Large scale Hamming Distance query processing

Benjamin Sach - One of the best experts on this subject based on the ideXlab platform.

tight cell probe bounds for online Hamming Distance computation

tight cell probe bounds for online Hamming Distance computation

Raphaël Clifford - One of the best experts on this subject based on the ideXlab platform.

ICALP - Approximate Hamming Distance in a stream

approximate Hamming Distance in a stream

Approximate Hamming Distance in a stream

tight cell probe bounds for online Hamming Distance computation

tight cell probe bounds for online Hamming Distance computation

Alex X Liu - One of the best experts on this subject based on the ideXlab platform.

large scale Hamming Distance query processing

ICDE - Large scale Hamming Distance query processing

BCB - NcRNA homology search using Hamming Distance seeds

Anca L Ralescu - One of the best experts on this subject based on the ideXlab platform.

Adaptive measures of similarity---fuzzy Hamming Distance---and its applications to pattern recognition problems

fuzzy Hamming Distance in a content based image retrieval system

FUZZ-IEEE - Fuzzy Hamming Distance in a content-based image retrieval system

FUZZ-IEEE - Fuzzy Hamming Distance Based Banknote Validator