Analytical Processing

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 27822 Experts worldwide ranked by ideXlab platform

Philip S Yu - One of the best experts on this subject based on the ideXlab platform.

  • LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams
    2008 IEEE 24th International Conference on Data Engineering, 2008
    Co-Authors: Charu C. Aggarwal, Philip S Yu
    Abstract:

    In recent years, data streams have become ubiquitous because of advances in hardware and software technology. The ability to adapt conventional mining problems to data streams is a great challenge in a data stream environment. Many data streams are inherently high dimensional, which creates a special challenge for data mining algorithms. In this paper, we consider the problem of classification of high dimensional data streams. For the high dimensional case, even traditional classifiers do not work very well on fixed data sets. We discuss a number of insights for the intractability of the high dimensional case. We use these insights to propose a new classification method (LOCUST) which avoids many of these weaknesses. The key is to develop a subspace-based instance centered classification approach which can be implemented efficiently for a fast data stream. We propose a methodology to effectively process the data stream in an organized way, so that the intermediate data structures can be used to sample locally discriminative subspaces for the classification process. We show that LOCUST is able to work effectively in the high dimensional case, and is also flexible in terms of increased robustness with greater resource availability.

  • Graph OLAP: Towards online Analytical Processing on graphs
    Proceedings - IEEE International Conference on Data Mining, ICDM, 2008
    Co-Authors: Chen Chen, Feida Zhu, Jia Wei Han, Xifeng Yan, Philip S Yu
    Abstract:

    OLAP (On-Line Analytical Processing) is an important notion in data analysis. Recently, more and more graph or networked data sources come into being. There exists a similar need to deploy graph analysis from different perspectives and with multiple granularities. However, traditional OLAP technology cannot handle such demands because it does not consider the links among individual data tuples. In this paper, we develop a novel graph OLAP framework, which presents a multi-dimensional and multi-level view over graphs. The contributions of this work are two-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Then, with more emphasis on informational OLAP (topological OLAP will be covered in a future study due to the lack of space), we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. We can see that the aggregated graphs, which depend on the graph properties of underlying networks, are much harder to compute than their traditional OLAP counterparts, due to the increased structural complexity of data. Empirical studies show insightful results on real datasets and demonstrate the efficiency of our proposed optimizations.

Kedar Sambhoos - One of the best experts on this subject based on the ideXlab platform.

  • Data association and graph Analytical Processing of hard and soft intelligence data
    Proceedings of the 16th International Conference on Information Fusion, 2013
    Co-Authors: Ketan Date, G A Gross, Richie Nagi, Sushama Khopkar, Kedar Sambhoos
    Abstract:

    In traditional data fusion hard physical sensor data has been the main source of information. This has changed during the past decade, under the backdrop of counter insurgency (COIN). In the COIN environment the majority of information comes from human sources (soft data). The source of this information can be human informants or soldiers conducting reconnaissance in the field. This human sourced soft data is filled with vast amounts of valuable information. Recently a large number of Natural Language Processing techniques have been developed to process this soft data into the form of relational graphs. In this paper we have described various graph Analytical techniques that can be applied towards fusion of hard and soft information and understanding the situations of interest by an analyst. The Processing elements exhibited in this paper are association of entities and relations in observational hard and soft data graphs to form the cumulative data graph, situation assessment via graph matching of situations of interest against the cumulative data graph, and social network analysis to identify and extract high value individuals in the network. To illustrate these graph analytic tools we have used the Sunni message thread of SYNCOIN consisting of 114 soft messages and 4 hard data reports. The value of this work has been demonstrated with detailed analysis and examples from the aforementioned dataset.

  • Data association and graph Analytical Processing of hard and soft intelligence data
    Information Fusion (FUSION) 2013, 2013
    Co-Authors: Kazuyuki Date, G A Gross, Richie Nagi, Sushama Khopkar, Kedar Sambhoos
    Abstract:

    In traditional data fusion hard physical sensor data has been the main source of information. This has changed during the past decade, under the backdrop of counter insurgency (COIN). In the COIN environment the majority of information comes from human sources (soft data). The source of this information can be human informants or soldiers conducting reconnaissance in the field. This human sourced soft data is filled with vast amounts of valuable information. Recently a large number of Natural Language Processing techniques have been developed to process this soft data into the form of relational graphs. In this paper we have described various graph Analytical techniques that can be applied towards fusion of hard and soft information and understanding the situations of interest by an analyst. The Processing elements exhibited in this paper are association of entities and relations in observational hard and soft data graphs to form the cumulative data graph, situation assessment via graph matching of situations of interest against the cumulative data graph, and social network analysis to identify and extract high value individuals in the network. To illustrate these graph analytic tools we have used the Sunni message thread of SYNCOIN consisting of 114 soft messages and 4 hard data reports. The value of this work has been demonstrated with detailed analysis and examples from the aforementioned dataset. © 2013 ISIF ( Intl Society of Information Fusi.

J. Ertlschweiger - One of the best experts on this subject based on the ideXlab platform.

  • A prototype metadata database for online Analytical Processing of environmental data
    Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150), 1997
    Co-Authors: H. Geller, Sue Conger, J. Ertlschweiger
    Abstract:

    We present preliminary results on the development of a prototype database system demonstrating the utility of the integration of environmental metadata within an online Analytical Processing environment. We utilized existing data derived from CD-ROMs of the National Snow and Ice Data Center (NSIDC), the Consortium for International Earth Science Information Network (CIESIN) and the US Geological Survey (USGS). We populated a prototype metadata database whose architecture facilitates the scientific and statistical investigations of geophysical parameters associated with the polar regions, allowing for data fusion from other regions and Earth science disciplines, facilitating interdisciplinary studies. The user can extract information combining the knowledge of two disparate sources of geophysical data to allow a query that would result in a useful product. Furthermore, we demonstrate the utility of allowing access to this database via the World Wide Web using an interface to the underlying Oracle database management system.

  • SSDBM - A prototype metadata database for online Analytical Processing of environmental data
    Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150), 1997
    Co-Authors: Harold A. Geller, Sue Conger, J. Ertlschweiger
    Abstract:

    We present preliminary results on the development of a prototype database system demonstrating the utility of the integration of environmental metadata within an online Analytical Processing environment. We utilized existing data derived from CD-ROMs of the National Snow and Ice Data Center (NSIDC), the Consortium for International Earth Science Information Network (CIESIN) and the US Geological Survey (USGS). We populated a prototype metadata database whose architecture facilitates the scientific and statistical investigations of geophysical parameters associated with the polar regions, allowing for data fusion from other regions and Earth science disciplines, facilitating interdisciplinary studies. The user can extract information combining the knowledge of two disparate sources of geophysical data to allow a query that would result in a useful product. Furthermore, we demonstrate the utility of allowing access to this database via the World Wide Web using an interface to the underlying Oracle database management system.

Yaoxin Duan - One of the best experts on this subject based on the ideXlab platform.

  • Supporting Data Stream Analytical Processing in Vehicular Sensor Networks
    2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019
    Co-Authors: Yaoxin Duan
    Abstract:

    In the past decade, the Vehicular Sensor Network (VSN) technology has emerged as a promising technique to support Intelligent Transportation Systems (ITSs). Utilizing the information collection and communication capabilities provided by VSN, data can be firstly collected by in-vehicle sensors, and then uploaded to the infrastructure to support ITS applications. However, to get a global view of the status of the road, many ITS applications adopt a centralized approach, which requires to collect data from VSNs to a central server. In addition, to provide timely services, data has to be updated by vehicles continuously. As a result, massive amount of data could be generated by vehicles and transmitted in the network, which may exhaust the limited wireless communication bandwidth. In this work, we propose a data stream Analytical Processing framework for VSN named Streaming Vehdoop (SVehdoop). To reduce the bandwidth consumption, SVehdoop schedules part of data Processing tasks to where the data is located. Specifically, SVehdoop utilizes the computing capability of vehicles to efficiently process the collected data over a large number of vehicles in a distributed manner. A dynamic clustering algorithm, named Streaming Vehdoop Clustering (SVC) algorithm, is tailor-designed for SVehdoop to not only consider vehicle mobility to form stable clusters, but also to take account of data aggregation and data parallelism. Comprehensive experiments have been conducted to demonstrate the efficiency of SVehdoop and the proposed SVC algorithm.

  • Vehdoop: A Scalable Analytical Processing Framework for Vehicular Sensor Networks
    IEEE Transactions on Intelligent Transportation Systems, 2019
    Co-Authors: Yaoxin Duan, Sarana Nutanong
    Abstract:

    The vehicular sensor network (VSN) technology empowers intelligent transportation systems (ITSs) to support a wide range of road safety and traffic management applications. By taking advantage of the information collection and communication capabilities offered by VSNs, information, such as speed, travel time, dash-camera video, and so on, can be gathered from sensors embedded in vehicles and then delivered to the infrastructure to support ITS applications. The explosive growth in the availability and variety of sensor instruments as well as the number of vehicles provides us with the opportunity to create large-scale ITS applications, which demand large-scale data Processing. In order to support large-scale data Processing, Google proposed the MapReduce framework. The MapReduce framework provides scalability in a large-scale data cluster by performing aggregate computations as close to the data source as possible. However, supporting ITS applications over VSN is not just a matter of simply applying the existing MapReduce framework to VSN due to the limited wireless bandwidth and the highly dynamic network topology. In this paper, we propose an Analytical Processing framework for VSNs called Vehdoop. Vehdoop utilizes the computing capability of vehicles to efficiently process sensor data in parallel across a large number of vehicles in a decentralized manner. We conducted extensive experiments using vehicle trajectories generated from Simulation of Urban MObility (SUMO) and a network simulator, NS-3, to simulate vehicle-to-vehicle and vehicle-to-infrastructure communications. The experimental results demonstrate the superiority of Vehdoop.

Chen Chen - One of the best experts on this subject based on the ideXlab platform.

  • Graph OLAP: Towards online Analytical Processing on graphs
    Proceedings - IEEE International Conference on Data Mining, ICDM, 2008
    Co-Authors: Chen Chen, Feida Zhu, Jia Wei Han, Xifeng Yan, Philip S Yu
    Abstract:

    OLAP (On-Line Analytical Processing) is an important notion in data analysis. Recently, more and more graph or networked data sources come into being. There exists a similar need to deploy graph analysis from different perspectives and with multiple granularities. However, traditional OLAP technology cannot handle such demands because it does not consider the links among individual data tuples. In this paper, we develop a novel graph OLAP framework, which presents a multi-dimensional and multi-level view over graphs. The contributions of this work are two-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Then, with more emphasis on informational OLAP (topological OLAP will be covered in a future study due to the lack of space), we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. We can see that the aggregated graphs, which depend on the graph properties of underlying networks, are much harder to compute than their traditional OLAP counterparts, due to the increased structural complexity of data. Empirical studies show insightful results on real datasets and demonstrate the efficiency of our proposed optimizations.