Wildcard

The Experts below are selected from a list of 3957 Experts worldwide ranked by ideXlab platform

Xing Quan Zhu - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

Knowledge-Based Systems, 2017

Co-Authors: Fei Xie, Xin Dong Wu, Xing Quan Zhu

Abstract:

A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use Wildcards (or gap constraints) to help extract sequential patterns, so the flexible Wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with Wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.

15 days free trial to Access Article

Jeffrey Scott Vitter - One of the best experts on this subject based on the ideXlab platform.

space efficient string indexing for Wildcard pattern matching

Symposium on Theoretical Aspects of Computer Science, 2014

Co-Authors: Moshe Lewenstein, Yakov Nekrich, Jeffrey Scott Vitter

Abstract:

In this paper we describe compressed indexes that support pattern matching queries for strings with Wildcards. For a constant size alphabet our data structure uses O(n.log^e(n)) bits for any e>0 and reports all occ occurrences of a Wildcard string in O(m+s^g.M(n)+occ) time, where M(n)=o(log(log(log(n)))), s is the alphabet size, m is the number of alphabet symbols and g is the number of Wildcard symbols in the query string. We also present an O(n)-bit index with O((m+s^g+occ).log^e(n)) query time and an O(n{log(log(n))}^2)-bit index with O((m+s^g+occ).log(log(n))) query time. These are the first non-trivial data structures for this problem that need o(n.log(n)) bits of space.

15 days free trial to Access Article
STACS - Space-Efficient String Indexing for Wildcard Pattern Matching.

2014

Co-Authors: Moshe Lewenstein, Yakov Nekrich, Jeffrey Scott Vitter

Abstract:

In this paper we describe compressed indexes that support pattern matching queries for strings with Wildcards. For a constant size alphabet our data structure uses O(n.log^e(n)) bits for any e>0 and reports all occ occurrences of a Wildcard string in O(m+s^g.M(n)+occ) time, where M(n)=o(log(log(log(n)))), s is the alphabet size, m is the number of alphabet symbols and g is the number of Wildcard symbols in the query string. We also present an O(n)-bit index with O((m+s^g+occ).log^e(n)) query time and an O(n{log(log(n))}^2)-bit index with O((m+s^g+occ).log(log(n))) query time. These are the first non-trivial data structures for this problem that need o(n.log(n)) bits of space.

15 days free trial to Access Article
Compressed text indexing with Wildcards

Journal of Discrete Algorithms, 2013

Co-Authors: Tsung-han Ku, Rahul Shah, Sharma V. Thankachan, Jeffrey Scott Vitter

Abstract:

Let T=T"[email protected]^k^"^1T"[email protected]^k^"^[email protected]^k^"^dT"d"+"1 be a text of total length n, where characters of each T"i are chosen from an alphabet @S of size @s, and @f denotes a Wildcard symbol. The text indexing with Wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as Wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH"h+o([email protected])+O(dlogn) bits of space, where H"h is the hth-order empirical entropy (h=o(log"@sn)) of T.

15 days free trial to Access Article
SPIRE - Compressed text indexing with Wildcards

String Processing and Information Retrieval, 2011

Co-Authors: Tsung-han Ku, Rahul Shah, Sharma V. Thankachan, Jeffrey Scott Vitter

Abstract:

Let T = T1φk1T2φk2 .... φkdTd+1 be a text of total length n, where characters of each Ti are chosen from an alphabet Σ of size σ, and φ denotes a Wildcard symbol. The text indexing with Wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as Wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nHh + o(n log σ) + O(d log n) bits space, where Hh is the hth-order empirical entropy (h = o(logσ n)) of T.

15 days free trial to Access Article

Xin Dong Wu - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

Knowledge-Based Systems, 2017

Co-Authors: Fei Xie, Xin Dong Wu, Xing Quan Zhu

Abstract:

A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use Wildcards (or gap constraints) to help extract sequential patterns, so the flexible Wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with Wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.

15 days free trial to Access Article
document specific keyphrase extraction using sequential patterns with Wildcards

International Conference on Data Mining, 2014

Co-Authors: Xin Dong Wu

Abstract:

Finding good key phrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific key phrase extraction method. Our key innovation is to use Wildcards (or gap constraints) to help extract sequential patterns, where the flexible Wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with Wildcard and one-off conditions that allows important key phrases to be captured during the mining process. For each extracted key phrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify key phrases from a test document. Comparisons on key phrase benchmark datasets confirm that our document-specific key phrase extraction method is effective in improving the quality of extracted key phrases.

15 days free trial to Access Article
ICDM - Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards

2014 IEEE International Conference on Data Mining, 2014

Co-Authors: Xin Dong Wu

Abstract:

Finding good key phrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific key phrase extraction method. Our key innovation is to use Wildcards (or gap constraints) to help extract sequential patterns, where the flexible Wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with Wildcard and one-off conditions that allows important key phrases to be captured during the mining process. For each extracted key phrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify key phrases from a test document. Comparisons on key phrase benchmark datasets confirm that our document-specific key phrase extraction method is effective in improving the quality of extracted key phrases.

15 days free trial to Access Article
Pattern Matching with Flexible Wildcards

Journal of Computer Science and Technology, 2014

Co-Authors: Xin Dong Wu, Jipeng Qiang

Abstract:

Pattern matching with Wildcards (PMW) has great theoretical and practical significance in bioinformatics, information retrieval, and pattern mining. Due to the uncertainty of Wildcards, not only is the number of all matches exponential with respect to the maximal gap flexibility and the pattern length, but the matching positions in PMW are also hard to choose. The objective to count the maximal number of matches one by one is computationally infeasible. Therefore, rather than solving the generic PMW problem, many research efforts have further defined new problems within PMW according to different application backgrounds. To break through the limitations of either fixing the number or allowing an unbounded number of Wildcards, pattern matching with flexible Wildcards (PMFW) allows the users to control the ranges of Wildcards. In this paper, we provide a survey on the state-of-the-art algorithms for PMFW, with detailed analyses and comparisons, and discuss challenges and opportunities in PMFW research and applications.

15 days free trial to Access Article
mining sequential patterns with periodic Wildcard gaps

Applied Intelligence, 2014

Co-Authors: Youxi Wu, Lingling Wang, Wei Ding, Xin Dong Wu

Abstract:

Mining frequent patterns with periodic Wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with Wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic Wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.

15 days free trial to Access Article

Fei Xie - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

Knowledge-Based Systems, 2017

Co-Authors: Fei Xie, Xin Dong Wu, Xing Quan Zhu

Abstract:

A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use Wildcards (or gap constraints) to help extract sequential patterns, so the flexible Wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with Wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.

15 days free trial to Access Article

Sharma V. Thankachan - One of the best experts on this subject based on the ideXlab platform.

document retrieval with one Wildcard

Mathematical Foundations of Computer Science, 2014

Co-Authors: Moshe Lewenstein, Yakov Nekrich, Ian J Munro, Sharma V. Thankachan

Abstract:

In this paper we extend several well-known document listing problems to the case when documents contain a substring that approximately matches the query pattern. We study the scenario when the query string can contain a Wildcard symbol that matches any alphabet symbol; all documents that match a query pattern with one Wildcard must be enumerated. We describe a linear space data structure that reports all documents containing a substring P in \(O(|P|+\sigma \sqrt{\log\log \log n} + \mathtt{docc})\) time, where σ is the alphabet size and docc is the number of listed documents. We also describe a succinct solution for this problem.

15 days free trial to Access Article
MFCS (2) - Document Retrieval with One Wildcard

Mathematical Foundations of Computer Science 2014, 2014

Co-Authors: Moshe Lewenstein, Yakov Nekrich, J. Ian Munro, Sharma V. Thankachan

Abstract:

In this paper we extend several well-known document listing problems to the case when documents contain a substring that approximately matches the query pattern. We study the scenario when the query string can contain a Wildcard symbol that matches any alphabet symbol; all documents that match a query pattern with one Wildcard must be enumerated. We describe a linear space data structure that reports all documents containing a substring P in \(O(|P|+\sigma \sqrt{\log\log \log n} + \mathtt{docc})\) time, where σ is the alphabet size and docc is the number of listed documents. We also describe a succinct solution for this problem.

15 days free trial to Access Article
Compressed text indexing with Wildcards

Journal of Discrete Algorithms, 2013

Co-Authors: Tsung-han Ku, Rahul Shah, Sharma V. Thankachan, Jeffrey Scott Vitter

Abstract:

Let T=T"[email protected]^k^"^1T"[email protected]^k^"^[email protected]^k^"^dT"d"+"1 be a text of total length n, where characters of each T"i are chosen from an alphabet @S of size @s, and @f denotes a Wildcard symbol. The text indexing with Wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as Wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH"h+o([email protected])+O(dlogn) bits of space, where H"h is the hth-order empirical entropy (h=o(log"@sn)) of T.

15 days free trial to Access Article
SPIRE - Compressed text indexing with Wildcards

String Processing and Information Retrieval, 2011

Co-Authors: Tsung-han Ku, Rahul Shah, Sharma V. Thankachan, Jeffrey Scott Vitter

Abstract:

Let T = T1φk1T2φk2 .... φkdTd+1 be a text of total length n, where characters of each Ti are chosen from an alphabet Σ of size σ, and φ denotes a Wildcard symbol. The text indexing with Wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as Wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nHh + o(n log σ) + O(d log n) bits space, where Hh is the hth-order empirical entropy (h = o(logσ n)) of T.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Xing Quan Zhu - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

Jeffrey Scott Vitter - One of the best experts on this subject based on the ideXlab platform.

space efficient string indexing for Wildcard pattern matching

STACS - Space-Efficient String Indexing for Wildcard Pattern Matching.

Compressed text indexing with Wildcards

SPIRE - Compressed text indexing with Wildcards

Xin Dong Wu - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

document specific keyphrase extraction using sequential patterns with Wildcards

ICDM - Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards

Pattern Matching with Flexible Wildcards

mining sequential patterns with periodic Wildcard gaps

Fei Xie - One of the best experts on this subject based on the ideXlab platform.

Efficient sequential pattern mining with Wildcards for keyphrase extraction

Sharma V. Thankachan - One of the best experts on this subject based on the ideXlab platform.

document retrieval with one Wildcard

MFCS (2) - Document Retrieval with One Wildcard

Compressed text indexing with Wildcards

SPIRE - Compressed text indexing with Wildcards

Wildcard

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Xing Quan Zhu - One of the best experts on this subject based on the ideXlab platform.

Jeffrey Scott Vitter - One of the best experts on this subject based on the ideXlab platform.

Xin Dong Wu - One of the best experts on this subject based on the ideXlab platform.

Fei Xie - One of the best experts on this subject based on the ideXlab platform.

Sharma V. Thankachan - One of the best experts on this subject based on the ideXlab platform.

Related terms