Twig

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 38940 Experts worldwide ranked by ideXlab platform

Tok Wang Ling - One of the best experts on this subject based on the ideXlab platform.

  • Holistic Boolean-Twig Pattern Matching for Efficient XML Query Processing
    IEEE Transactions on Knowledge and Data Engineering, 2012
    Co-Authors: Dunren Che, Tok Wang Ling, Wen-chi Hou
    Abstract:

    Twig pattern matching is a critical operation for XML query processing, and the holistic computing approach has shown superior performance over other methods. Since Bruno et al. introduced the first holistic Twig join algorithm, TwigStack, numerous so-called holistic Twig join algorithms have been proposed. Yet practical XML queries often require support for more general Twig patterns, such as the ones that allow arbitrary occurrences of an arbitrary number of logical connectives (AND, OR, and NOT); such types of Twigs are referred to as B-Twigs (i.e., Boolean-Twigs) or AND/OR/NOT-Twigs. We have seen interesting work on generalizing the holistic Twig join approach to AND/OR-Twigs and AND/NOT-Twigs, but have not seen any further effort addressing the problem of AND/OR/NOT-Twigs at the full scale, which therefore forms the main theme of this paper. In this paper, we investigate novel mechanisms for efficient B-Twig pattern matching. In particular, we introduce “B-Twig normalization” as an important first-step in our approach toward eventually conquering the complexity of B-Twigs, and then present BTwigMerge-the first holistic Twig join algorithm designed for B-Twigs. Both analytical and experimental results show that BTwigMerge is optimal for B-Twig patterns with AD (Ancestor-Descendant) edges and/or PC (Parent-Child) edges.

  • XSym - TP+Output: modeling complex output information in XML Twig pattern query
    Database and XML Technologies, 2010
    Co-Authors: Tok Wang Ling, Gillian Dobbie
    Abstract:

    Twig pattern is considered a core pattern for XML queries. However, due to the limited expressivity of Twig pattern expressions, many queries that aim to find complex output information under one object cannot be expressed in a single Twig pattern. Instead, they have to be expressed as XQuery expression, which is transformed into several Twig patterns linked by joins. To process such an XQuery query, we need to match multiple Twig patterns to the XML document, even though they are all centered on the same object. In this paper we analyze the characteristics of each query node, i.e. the purpose, optionality and occurrence, and define four types of nodes in a Twig pattern query to express output information, namely, output node, optional-output node, predicatedoutput node, and optional-predicated-output node. Then we propose the TP+Output expression to extend Twig pattern queries, to model complex output information based on the semantics of different node types. With TP+Output, queries with the four output types can be expressed in one TP+Output expression and processed more efficiently. We extend our previously proposed Twig pattern query processing algorithm, VERT, to process the TP+Output query, and demonstrate the performance improvement of using TP+Output to represent queries.

  • Efficient processing of multiple XML Twig queries
    Lecture Notes in Computer Science, 2006
    Co-Authors: Huanzhang Liu, Tok Wang Ling
    Abstract:

    Finding all occurrences of a Twig pattern in an XML document is a core operation for XML query processing. The emergence of XML as a common mark-up language for data interchange has spawned great interest in techniques for filtering and content-based routing of XML data. In this paper, we aim to use the state-of-art holistic Twig join technique to address multiple Twig queries in a large scale XML database. We propose a new Twig query technique which is specially tailored to match documents with large numbers of Twig pattern queries. We introduce the super-Twig to represent multiple Twig queries. Based on the super-Twig, we design a holistic Twig join algorithm, called MTwigStack, to find all matches for multiple Twig queries by scanning an XML document only once.

  • DEXA - Efficient processing of multiple XML Twig queries
    Lecture Notes in Computer Science, 2006
    Co-Authors: Huanzhang Liu, Tok Wang Ling
    Abstract:

    Finding all occurrences of a Twig pattern in an XML document is a core operation for XML query processing. The emergence of XML as a common mark-up language for data interchange has spawned great interest in techniques for filtering and content-based routing of XML data. In this paper, we aim to use the state-of-art holistic Twig join technique to address multiple Twig queries in a large scale XML database. We propose a new Twig query technique which is specially tailored to match documents with large numbers of Twig pattern queries. We introduce the super-Twig to represent multiple Twig queries. Based on the super-Twig, we design a holistic Twig join algorithm, called MTwigStack, to find all matches for multiple Twig queries by scanning an XML document only once.

  • DASFAA - TwigStackList ¬: a holistic Twig join algorithm for Twig query with not-predicates on XML data
    Database Systems for Advanced Applications, 2006
    Co-Authors: Tok Wang Ling
    Abstract:

    As business and enterprises generate and exchange XML data more often, there is an increasing need for searching and querying XML data. A lot of researches have been done to match XML Twig queries. However, as far as we know, very little work has examined the efficient processing of XML Twig queries with not-predicates. In this paper, we propose a novel holistic Twig join algorithm, called TwigStackList ¬, which is designed for efficient matching an XML Twig pattern with negation. We show that TwigStackList ¬ can identify a large query class to guarantee the I/O optimality. Finally, we run extensive experiments that validate our algorithm and show the efficiency and effectiveness of TwigStackList ¬.

Bongki Moon - One of the best experts on this subject based on the ideXlab platform.

  • hadoopxml a suite for parallel processing of massive xml data with multiple Twig pattern queries
    Conference on Information and Knowledge Management, 2012
    Co-Authors: Hyebong Choi, Bongki Moon
    Abstract:

    The volume of XML data is tremendous in many areas, but especially in data logging and scientific areas. XML data in the areas are accumulated over time as new data are continuously collected. It is a challenge to process massive XML data with multiple Twig pattern queries given by multiple users in a timely manner. We showcase HadoopXML, a system that simultaneously processes many Twig pattern queries for a massive volume of XML data with Hadoop. Specifically, HadoopXML provides an efficient way to process a single large XML file in parallel. It processes multiple Twig pattern queries simultaneously with a shared input scan. Users do not need to iterate M/R jobs for each query. HadoopXML also reduces many I/Os by enabling Twig pattern queries to share their path solutions each other. Moreover, HadoopXML provides a sophisticated runtime load balancing scheme for fairly assigning multiple Twig pattern joins across nodes. With synthetic and real world XML dataset, we demonstrate how efficiently HadoopXML processes many Twig pattern queries in a shared and balanced way.

  • Value-based predicate filtering of XML documents
    Data & Knowledge Engineering, 2008
    Co-Authors: Joonho Kwon, Bongki Moon, Praveen Rao, Sukho Lee
    Abstract:

    In recent years, publish-subscribe systems based on XML filtering have received much attention in ubiquitous computing environments and Internet applications. The main challenge is to process a large number of content against millions of user subscriptions. Several XML filtering systems focus on the efficient processing of structural matching of user subscriptions represented as XPath Twig patterns. However, existing techniques provide limited or no support for Twig patterns that contain various operators in the value-based predicates. In this paper, we present the pFiST system that filters XML documents by transforming Twig patterns into sequences based on Prufer's method. This sequencing idea for XML filtering was first demonstrated by FiST [J. Kwon, P. Rao, B. Moon, S. Lee, FiST: scalable XML document filtering by sequencing Twig patterns, in: Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005, pp. 217-228]. The focus of pFiST is to support value-based predicates in Twig patterns in addition to matching their structure. The pFiST system supports equality and non-equality operators, and in addition can handle logical operators such as AND and OR in the value-based predicates. Extensive experimental results show that pFiST provides good performance over data sets with different characteristics.

  • Sequencing XML data and query Twigs for fast pattern matching
    ACM Transactions on Database Systems, 2006
    Co-Authors: Bongki Moon
    Abstract:

    We propose a new way of indexing XML documents and processing Twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by prufer's method that constructs a one-to-one correspondence between trees and sequences. During query processing, a Twig pattern is also transformed into its Prufer sequence. By performing subsequence matching on the set of sequences in the database and performing a series of refinement phases that we have developed, we can find all the occurrences of a Twig pattern in the database. Our approach allows holistic processing of a Twig pattern without breaking the Twig into root-to-leaf paths and processing these paths individually. Furthermore, we show in the article that all correct answers are found without any false dismissals or false alarms. Experimental results demonstrate the performance benefits of our proposed techniques.

  • ICDE - PRIX: indexing and querying XML using prufer sequences
    Proceedings. 20th International Conference on Data Engineering, 2004
    Co-Authors: Bongki Moon
    Abstract:

    We propose a new way of indexing XML documents and processing Twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by Prufer's method that constructs a one-to-one correspondence between trees and sequences. During query processing, a Twig pattern is also transformed into its Prufer sequence. By performing subsequence matching on the set of sequences in the database, and performing a series of refinement phases that we have developed, we can find all the occurrences of a Twig pattern in the database. Our approach allows holistic processing of a Twig pattern without breaking the Twig into root-to-leaf paths and processing these paths individually. Furthermore, we show that all correct answers are found without any false dismissals or false alarms. Experimental results demonstrate the performance benefits of our proposed techniques.

Jian Liu - One of the best experts on this subject based on the ideXlab platform.

  • Efficient processing of Twig query with compound predicates in fuzzy XML
    Fuzzy Sets and Systems, 2013
    Co-Authors: Jian Liu
    Abstract:

    In order to find all occurrences of a Twig pattern in XML documents, a considerable number of Twig pattern matching algorithms have been proposed. Previous algorithms mainly focus on the conjunctive Twig queries whose sibling edges are only connected by AND connectives. However, meaningful Twig queries typically contain arbitrarily specified compound predicates including AND, OR and NOT in practical applications. Moreover, as far as we know, none of these Twig matching algorithms have examined the processing of Twig queries which contain all the compound predicates over fuzzy XML data. In this paper, we present the first study on evaluating Twig queries with AND, OR and NOT connectives in fuzzy XML. We propose a novel holistic Twig matching algorithm called LTwig for answering these complex queries. Our algorithm guarantees that the answers can be obtained by scanning the relevant data of the data streams associated with the nodes appearing in the Twig pattern only once. A comprehensive set of experiments is finally carried out to demonstrate the effectiveness and efficiency of our proposed approach.

  • Matching Twigs in fuzzy XML
    Information Sciences, 2011
    Co-Authors: Jian Liu, Li Yan
    Abstract:

    A considerable amount of Twig pattern matching algorithms have been proposed to holistically process a Twig query. Those algorithms mainly focus on Twig pattern query with the AND-logic. However, there is often a need to process a Twig query with the OR-predicates. Furthermore, the existing algorithms fall short in their ability to support Twig query with OR-logic in fuzzy XML. To overcome this limitation, in this paper, we first introduce a novel encoding scheme to represent node information in fuzzy XML. Based on the encoding scheme, we then propose an effective algorithm for matching a Twig pattern query with the AND/OR-logic in fuzzy XML. Our approach adopts a compact stack technique to process the complicated Twig query consisting of both AND-logic and OR-logic. More importantly, our method eliminates re-scanning unnecessary portions of XML documents and redundant intermediate results. Finally, the experimental results demonstrate the performance advantages of our approach.

  • CIKM - Efficient processing of Twig pattern matching in fuzzy XML
    Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09, 2009
    Co-Authors: Jian Liu, Li Yan
    Abstract:

    In order to find all occurrences of a Twig pattern in XML documents, a considerable amount of Twig pattern matching algorithms have been proposed. At the same time, previous work mainly focuses on Twig pattern query under the complete semantics. However, there is often a need to produce partial answers because XML data may have missing sub-elements. Furthermore, the existed works fall short in their ability to support Twig pattern query under different semantics in fuzzy XML. In this paper, we study the problem of Twig matches in fuzzy XML. We begin by introducing the extended region scheme to accurately and effectively represent nodes information in fuzzy XML. We then discuss the fuzzy query semantics and compute the membership information by using Einstein operator instead of Zadeh's min-max technique. On the basis, we propose two efficient algorithms for querying Twig under complete and incomplete semantics in fuzzy XML. The experimental results show that our proposed algorithms can perform on the fuzzy Twig pattern matching efficiently.

Derick Wood - One of the best experts on this subject based on the ideXlab platform.

  • On the optimality of holistic algorithms for Twig queries
    Lecture Notes in Computer Science, 2003
    Co-Authors: Byron Choi, Malika Mahoui, Derick Wood
    Abstract:

    Streaming XML documents has many emerging applications. However, in this paper, we show that the restrictions imposed by data streaming are too restrictive for processing Twig queries - the core operation for XML query processing. Previous proposed algorithm TwigStack is an optimal algorithm for processing Twig queries with only descendent edges over streams of nodes. The cause of the suboptimality of the TwigStack algorithm is the structurally recursions appearing in XML documents. We show that without relaxing the data streaming model, it is not possible to develop an optimal holistic algorithm for Twig queries. Also the computation of the Twig queries is not memory bounded. This motivates us to study two variations of the data streaming model: (1) offline sorting is allowed and the algorithm is allowed to select the correct nodes to be streamed and (2) multiple scans on the data streams are allowed. We show the lower bounds of the two variations.

  • DEXA - On the optimality of holistic algorithms for Twig queries
    Lecture Notes in Computer Science, 2003
    Co-Authors: Byron Choi, Malika Mahoui, Derick Wood
    Abstract:

    Streaming XML documents has many emerging applications. However, in this paper, we show that the restrictions imposed by data streaming are too restrictive for processing Twig queries – the core operation for XML query processing. Previous proposed algorithm TwigStack is an optimal algorithm for processing Twig queries with only descendent edges over streams of nodes. The cause of the suboptimality of the TwigStack algorithm is the structurally recursions appearing in XML documents. We show that without relaxing the data streaming model, it is not possible to develop an optimal holistic algorithm for Twig queries. Also the computation of the Twig queries is not memory bounded. This motivates us to study two variations of the data streaming model: (1) offline sorting is allowed and the algorithm is allowed to select the correct nodes to be streamed and (2) multiple scans on the data streams are allowed. We show the lower bounds of the two variations.

Wen-chi Hou - One of the best experts on this subject based on the ideXlab platform.

  • Holistic Boolean-Twig Pattern Matching for Efficient XML Query Processing
    IEEE Transactions on Knowledge and Data Engineering, 2012
    Co-Authors: Dunren Che, Tok Wang Ling, Wen-chi Hou
    Abstract:

    Twig pattern matching is a critical operation for XML query processing, and the holistic computing approach has shown superior performance over other methods. Since Bruno et al. introduced the first holistic Twig join algorithm, TwigStack, numerous so-called holistic Twig join algorithms have been proposed. Yet practical XML queries often require support for more general Twig patterns, such as the ones that allow arbitrary occurrences of an arbitrary number of logical connectives (AND, OR, and NOT); such types of Twigs are referred to as B-Twigs (i.e., Boolean-Twigs) or AND/OR/NOT-Twigs. We have seen interesting work on generalizing the holistic Twig join approach to AND/OR-Twigs and AND/NOT-Twigs, but have not seen any further effort addressing the problem of AND/OR/NOT-Twigs at the full scale, which therefore forms the main theme of this paper. In this paper, we investigate novel mechanisms for efficient B-Twig pattern matching. In particular, we introduce “B-Twig normalization” as an important first-step in our approach toward eventually conquering the complexity of B-Twigs, and then present BTwigMerge-the first holistic Twig join algorithm designed for B-Twigs. Both analytical and experimental results show that BTwigMerge is optimal for B-Twig patterns with AD (Ancestor-Descendant) edges and/or PC (Parent-Child) edges.

  • efficient processing of xml Twig pattern a novel one phase holistic solution
    Database and Expert Systems Applications, 2007
    Co-Authors: Zhewei Jiang, Wen-chi Hou, Cheng Luo, Qiang Zhu, Dunren Che
    Abstract:

    Modern Twig query evaluation algorithms usually first generate individual path matches and then stitch them together (through a "merge" operation) to form Twig matches. In this paper, we propose a one-phase holistic Twig evaluation algorithm based on the TwigStack algorithm. The proposed method applies a novel stack structure to preserve the holisticity of the Twig matches. Without generating intermediate path matches, our method avoids the storage of individual path matches and the path merge process. Experimental results confirm the advantages of our approach.