Online Dictionary

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9429 Experts worldwide ranked by ideXlab platform

Chang Wen Chen - One of the best experts on this subject based on the ideXlab platform.

  • sparse representation with spatio temporal Online Dictionary learning for promising video coding
    IEEE Transactions on Image Processing, 2016
    Co-Authors: Wenrui Dai, Xin Tang, Hongkai Xiong, Yangmei Shen, Junni Zou, Chang Wen Chen
    Abstract:

    Classical Dictionary learning methods for video coding suffer from high computational complexity and interfered coding efficiency by disregarding its underlying distribution. This paper proposes a spatio-temporal Online Dictionary learning (STOL) algorithm to speed up the convergence rate of Dictionary learning with a guarantee of approximation error. The proposed algorithm incorporates stochastic gradient descents to form a Dictionary of pairs of 3D low-frequency and high-frequency spatio-temporal volumes. In each iteration of the learning process, it randomly selects one sample volume and updates the atoms of Dictionary by minimizing the expected cost, rather than optimizes empirical cost over the complete training data, such as batch learning methods, e.g., K-SVD. Since the selected volumes are supposed to be independent identically distributed samples from the underlying distribution, decomposition coefficients attained from the trained Dictionary are desirable for sparse representation. Theoretically, it is proved that the proposed STOL could achieve better approximation for sparse representation than K-SVD and maintain both structured sparsity and hierarchical sparsity. It is shown to outperform batch gradient descent methods (K-SVD) in the sense of convergence speed and computational complexity, and its upper bound for prediction error is asymptotically equal to the training error. With lower computational complexity, extensive experiments validate that the STOL-based coding scheme achieves performance improvements than H.264/AVC or High Efficiency Video Coding as well as existing super-resolution-based methods in rate-distortion performance and visual quality.

Tsvi Kopelowitz - One of the best experts on this subject based on the ideXlab platform.

  • mind the gap essentially optimal algorithms for Online Dictionary matching with one gap
    International Symposium on Algorithms and Computation, 2016
    Co-Authors: Tsvi Kopelowitz, Ely Porat, Amihood Amir, Avivit Levy, Seth Pettie, Riva B Shalom
    Abstract:

    We examine the complexity of the Online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a Dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives Online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal. We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the Online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions.

  • succinct Online Dictionary matching with improved worst case guarantees
    Combinatorial Pattern Matching, 2016
    Co-Authors: Tsvi Kopelowitz, Ely Porat, Yaron Rozen
    Abstract:

    In the Online Dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an Online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the Online Dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the Online Dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1+occ) time.

  • CPM - Succinct Online Dictionary Matching with Improved Worst-Case Guarantees.
    2016
    Co-Authors: Tsvi Kopelowitz, Ely Porat, Yaron Rozen
    Abstract:

    In the Online Dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an Online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the Online Dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the Online Dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1+occ) time.

Yaron Rozen - One of the best experts on this subject based on the ideXlab platform.

  • succinct Online Dictionary matching with improved worst case guarantees
    Combinatorial Pattern Matching, 2016
    Co-Authors: Tsvi Kopelowitz, Ely Porat, Yaron Rozen
    Abstract:

    In the Online Dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an Online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the Online Dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the Online Dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1+occ) time.

  • CPM - Succinct Online Dictionary Matching with Improved Worst-Case Guarantees.
    2016
    Co-Authors: Tsvi Kopelowitz, Ely Porat, Yaron Rozen
    Abstract:

    In the Online Dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an Online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the Online Dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the Online Dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1+occ) time.

Taha Yasseri - One of the best experts on this subject based on the ideXlab platform.

  • emo love and god making sense of urban Dictionary a crowd sourced Online Dictionary
    Royal Society Open Science, 2018
    Co-Authors: Dong Nguyen, Barbara Mcgillivray, Taha Yasseri
    Abstract:

    The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the 'wisdom of the crowd' has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowd-sourced Online Dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the Dictionary's voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.

  • emo love and god making sense of urban Dictionary a crowd sourced Online Dictionary
    arXiv: Computation and Language, 2017
    Co-Authors: Dong Nguyen, Barbara Mcgillivray, Taha Yasseri
    Abstract:

    The Internet facilitates large-scale collaborative projects. The emergence of Web~2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often un-monitored environment of such projects may make them susceptible to systematic malfunction and misbehavior. In this work, we focus on Urban Dictionary, a crowd-sourced Online Dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. There is also a high presence of offensive content, but highly offensive content tends to receive lower scores through the voting system. Our study highlights that Urban Dictionary has a higher content heterogeneity than found in traditional dictionaries, which poses challenges in terms in processing but also offers opportunities to analyze and track language innovation.

Wenrui Dai - One of the best experts on this subject based on the ideXlab platform.

  • sparse representation with spatio temporal Online Dictionary learning for promising video coding
    IEEE Transactions on Image Processing, 2016
    Co-Authors: Wenrui Dai, Xin Tang, Hongkai Xiong, Yangmei Shen, Junni Zou, Chang Wen Chen
    Abstract:

    Classical Dictionary learning methods for video coding suffer from high computational complexity and interfered coding efficiency by disregarding its underlying distribution. This paper proposes a spatio-temporal Online Dictionary learning (STOL) algorithm to speed up the convergence rate of Dictionary learning with a guarantee of approximation error. The proposed algorithm incorporates stochastic gradient descents to form a Dictionary of pairs of 3D low-frequency and high-frequency spatio-temporal volumes. In each iteration of the learning process, it randomly selects one sample volume and updates the atoms of Dictionary by minimizing the expected cost, rather than optimizes empirical cost over the complete training data, such as batch learning methods, e.g., K-SVD. Since the selected volumes are supposed to be independent identically distributed samples from the underlying distribution, decomposition coefficients attained from the trained Dictionary are desirable for sparse representation. Theoretically, it is proved that the proposed STOL could achieve better approximation for sparse representation than K-SVD and maintain both structured sparsity and hierarchical sparsity. It is shown to outperform batch gradient descent methods (K-SVD) in the sense of convergence speed and computational complexity, and its upper bound for prediction error is asymptotically equal to the training error. With lower computational complexity, extensive experiments validate that the STOL-based coding scheme achieves performance improvements than H.264/AVC or High Efficiency Video Coding as well as existing super-resolution-based methods in rate-distortion performance and visual quality.