Data Warehouses

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 58107 Experts worldwide ranked by ideXlab platform

Pedro Furtado - One of the best experts on this subject based on the ideXlab platform.

  • Overcoming the scalability limitations of parallel star schema Data Warehouses
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012
    Co-Authors: Jo X E O Costa, José Cecílio, Pedro Martins, Pedro Furtado
    Abstract:

    Most Data Warehouses (DW) are stored in Relational Database Management Systems (RDBMS) using a star-schema model. While this model yields a trade-off between performance and storage requirements, huge Data Warehouses experiment performance problems. Although parallel shared-nothing architectures improve on this matter by a divide-and-conquer approach, issues related to parallelizing join operations cause limitations on that amount of improvement, since they have implications concerning placement, the need to replicate Data and/or on-the-fly repartitioning. In this paper, we show how these limitations can be overcome by replacing the star schema by a universal relation approach for more efficient and scalable parallelization. We evaluate the proposed approach using TPC-H benchmark, to both demonstrate that it provides highly predictable response times and almost optimal speedup.

  • ICA3PP (1) - Overcoming the scalability limitations of parallel star schema Data Warehouses
    Algorithms and Architectures for Parallel Processing, 2012
    Co-Authors: João Costa, Pedro Martins, José Cecílio, Pedro Furtado
    Abstract:

    Most Data Warehouses (DW) are stored in Relational Database Management Systems (RDBMS) using a star-schema model. While this model yields a trade-off between performance and storage requirements, huge Data Warehouses experiment performance problems. Although parallel shared-nothing architectures improve on this matter by a divide-and-conquer approach, issues related to parallelizing join operations cause limitations on that amount of improvement, since they have implications concerning placement, the need to replicate Data and/or on-the-fly repartitioning. In this paper, we show how these limitations can be overcome by replacing the star schema by a universal relation approach for more efficient and scalable parallelization. We evaluate the proposed approach using TPC-H benchmark, to both demonstrate that it provides highly predictable response times and almost optimal speedup.

  • Node Partitioned Data Warehouses
    Data Warehousing and Mining, 2008
    Co-Authors: Pedro Furtado
    Abstract:

    Data Warehouses (DWs) with large quantities of Data present major performance and scalability challenges, and parallelism can be used for major performance improvement in such context. However, instead of costly specialized parallel hardware and interconnections, we focus on low-cost standard computing nodes, possibly in a non-dedicated local network. In this environment, special care must be taken with partitioning and processing. We use experimental evidence to analyze the shortcomings of a basic horizontal partitioning strategy designed for that environment, then propose and test improvements to allow efficient placement for the low-cost Node Partitioned Data Warehouse. We show experimentally that extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns. We analyze a simple, easy-to-apply partitioning and placement decision that achieves good performance improvement results. Our experiments and discussion provide important insight into partitioning and processing issues for Data Warehouses in shared-nothing environments.

  • DASFAA - Large relations in node-partitioned Data Warehouses
    Database Systems for Advanced Applications, 2005
    Co-Authors: Pedro Furtado
    Abstract:

    A cheap shared-nothing context can be used to provide significant speedup on large Data Warehouses, but partitioning and placement decisions are important in such systems as repartitioning requirements can result in much less-than-linear speedup. This problem can be minimized if query workload and schemas are inputs to placement decisions. In this paper we analyze the problem of handling large relations in a node partitioned Data warehouse (NPDW) with a basic placement strategy that partitions facts horizontally and replicates dimensions, with the help of a cost model. Then we propose a strategy to improve performance and show both analytical and TPC-H results.

  • experimental evidence on partitioning in parallel Data Warehouses
    Data Warehousing and OLAP, 2004
    Co-Authors: Pedro Furtado
    Abstract:

    Parallelism can be used for major performance improvement in large Data Warehouses (DW) with performance and scalability challenges. A simple low-cost shared-nothing architecture with horizontally fully-partitioned facts can be used to speedup response time of the Data warehouse significantly. However, extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns if special care is not taken during placement to minimize such overheads. In this paper we show these problems experimentally with the help of the performance evaluation benchmark TPC-H and identify simple modifications that can minimize such undesirable extra overheads. We analyze experimentally a simple and easy-to-apply partitioning and placement decision that achieves good performance improvement results.

Jerome Darmont - One of the best experts on this subject based on the ideXlab platform.

  • S4: A New Secure Scheme for Enforcing Privacy in Cloud Data Warehouses
    2017
    Co-Authors: Somayeh Moghadam, Jerome Darmont, Gérald Gavin
    Abstract:

    Outsourcing Data into the cloud becomes popular thanks to the pay-as-you-go paradigm. However, such practice raises privacy concerns. The conventional way to achieve Data privacy is to encrypt sensitive Data before outsourcing. When Data are encrypted, a trade-off must be achieved between security and efficient query processing. Existing solutions that adopt multiple encryption schemes induce a heavy overhead in terms of Data storage and query performance, and are not suited for cloud Data Warehouses. In this paper, we propose an efficient additive encryption scheme (S4) based on Shamir's secret sharing for securing Data Warehouses in the cloud. S4 addresses the shortcomings of existing approaches by reducing overhead while still enforcing good Data privacy. Experimental results show the efficiency of S4 in terms of computation and storage overhead with respect to existing solutions.

  • Benchmarking XML Data Warehouses
    2016
    Co-Authors: Hadj Mahboubi, Jerome Darmont
    Abstract:

    With the emergence of XML as a new standard for representing business Data, new decision-support applications (namely, XML Data Warehouses) are being developed. To ensure their feasibility, the issue of performance must be addressed. Performance in general, and the efficiency of performance optimization techniques in particular, is usually assessed with the help of benchmarks. However, there are, to the best of our knowledge, no XML decision-support benchmark. In this paper, we present the XML Warehouse Benchmark (XWB), which aims at filling this gap. XWB is based on an original reference model for XML Data Warehouses, and proposes a test XML Data warehouse and its associated XQuery decision-support workload that are derived from the well-known, relational decision-support benchmark TPC-H. Though at an early stage of development, XWB has been successfully used to test the efficiency of indexing and view materialization techniques in XML Data Warehouses.

  • A Join Index for XML Data Warehouses
    2016
    Co-Authors: Hadj Mahboubi, Kamel Aouiche, Jerome Darmont
    Abstract:

    XML Data Warehouses form an interesting basis for decision-support applications that exploit complex Data. However, native-XML Database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML Warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML Data.

  • Query Performance Optimization in XML Data Warehouses
    2010
    Co-Authors: Hadj Mahboubi, Jerome Darmont
    Abstract:

    XML Data Warehouses form an interesting basis for decision-support applications that exploit complex Data. However, native-XML Database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, we present two such techniques. First, we propose a join index that is specifically adapted to the multidimensional architecture of XML Warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, we present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, we measure the response time of a set of decision-support XQueries over an XML Data warehouse, with and without using our optimization techniques. Our experimental results demonstrate their efficiency, even when queries are complex and Data are voluminous.

  • fragmenting very large xml Data Warehouses via k means clustering algorithm
    International Journal of Business Intelligence and Data Mining, 2009
    Co-Authors: Alfredo Cuzzocrea, Jerome Darmont, Hadj Mahboubi
    Abstract:

    XML Data sources are gaining popularity in the context of Business Intelligence and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing complex and heterogeneous Data. However, XML-native Database systems currently suffer from limited performance, both in terms of volumes of manageable Data and query response time. Therefore, recent research efforts are focusing on horizontal fragmentation techniques, which are able to overcome the above limitations. However, classical fragmentation algorithms are not suitable to control the number of originated fragments, which instead plays a critical role in Data Warehouses. In this paper, we propose the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML Data Warehouses. We complement our analytical contribution with a comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms.

Christian Koncilia - One of the best experts on this subject based on the ideXlab platform.

  • the comet metamodel for temporal Data Warehouses
    Conference on Advanced Information Systems Engineering, 2002
    Co-Authors: Johann Eder, Christian Koncilia, Tadeusz Morzy
    Abstract:

    "The Times They Are A-Changing" (B. Dylan), and with them the structures, schemas, master Data, etc. of Data Warehouses. For the correct treatment of such changes in OLAP queries the orthogonality assumption of star schemas has to be abandoned. We propose the COMET model which allows to represent not only changes of transaction Data, as usual in Data Warehouses, but also of schema, and structure Data. The COMET model can then be used as basis of OLAP tools which are aware of structural changes and permit correct query results spanning multiple periods and thus different versions of dimension Data. In this paper we present the COMET metamodel in detail with all necessary integrity constraints and show how the intervals of structural stabilities can be computed for all components of a Data warehouse.

  • changes of dimension Data in temporal Data Warehouses
    Data Warehousing and Knowledge Discovery, 2001
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Time is one of the dimensions we frequently find in Data Warehouses allowing comparisons of Data in different periods. In current multi-dimensional Data warehouse technology changes of dimension Data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension Data.

  • DaWaK - Changes of Dimension Data in Temporal Data Warehouses
    Data Warehousing and Knowledge Discovery, 2001
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Time is one of the dimensions we frequently find in Data Warehouses allowing comparisons of Data in different periods. In current multi-dimensional Data warehouse technology changes of dimension Data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension Data.

  • Evolution of Dimension Data in Temporal Data Warehouses
    2000
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Multi-dimensional analysis is one of the most important applications of Data Warehouses, giving the possibility to aggregate and compare Data along dimensions relevant in the application domain. Typically time is one of the dimensions we nd in Data Warehouses allowing comparisons of di erent periods. The instances of dimensions, however, change over time - countries unite and separate, products emerge and vanish, organizational structures evolve. In current Data warehouse technology these changes cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal, putting heavy restrictions on the validity of OLAP queries spanning several periods. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between di erent temporal versions and enable the system to correctly answer queries spanning multiple periods and thus di erent versions of dimension Data.

Johann Eder - One of the best experts on this subject based on the ideXlab platform.

  • the comet metamodel for temporal Data Warehouses
    Conference on Advanced Information Systems Engineering, 2002
    Co-Authors: Johann Eder, Christian Koncilia, Tadeusz Morzy
    Abstract:

    "The Times They Are A-Changing" (B. Dylan), and with them the structures, schemas, master Data, etc. of Data Warehouses. For the correct treatment of such changes in OLAP queries the orthogonality assumption of star schemas has to be abandoned. We propose the COMET model which allows to represent not only changes of transaction Data, as usual in Data Warehouses, but also of schema, and structure Data. The COMET model can then be used as basis of OLAP tools which are aware of structural changes and permit correct query results spanning multiple periods and thus different versions of dimension Data. In this paper we present the COMET metamodel in detail with all necessary integrity constraints and show how the intervals of structural stabilities can be computed for all components of a Data warehouse.

  • changes of dimension Data in temporal Data Warehouses
    Data Warehousing and Knowledge Discovery, 2001
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Time is one of the dimensions we frequently find in Data Warehouses allowing comparisons of Data in different periods. In current multi-dimensional Data warehouse technology changes of dimension Data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension Data.

  • DaWaK - Changes of Dimension Data in Temporal Data Warehouses
    Data Warehousing and Knowledge Discovery, 2001
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Time is one of the dimensions we frequently find in Data Warehouses allowing comparisons of Data in different periods. In current multi-dimensional Data warehouse technology changes of dimension Data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension Data.

  • Evolution of Dimension Data in Temporal Data Warehouses
    2000
    Co-Authors: Johann Eder, Christian Koncilia
    Abstract:

    Multi-dimensional analysis is one of the most important applications of Data Warehouses, giving the possibility to aggregate and compare Data along dimensions relevant in the application domain. Typically time is one of the dimensions we nd in Data Warehouses allowing comparisons of di erent periods. The instances of dimensions, however, change over time - countries unite and separate, products emerge and vanish, organizational structures evolve. In current Data warehouse technology these changes cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal, putting heavy restrictions on the validity of OLAP queries spanning several periods. We propose an extension of the multi-dimensional Data model employed in Data Warehouses allowing to cope correctly with changes in dimension Data: a temporal multi-dimensional Data model allows the registration of temporal versions of dimension Data. Mappings are provided to transfer Data between di erent temporal versions and enable the system to correctly answer queries spanning multiple periods and thus di erent versions of dimension Data.

Philippe Beaune - One of the best experts on this subject based on the ideXlab platform.

  • Performance optimization of grid aggregation in spatial Data Warehouses
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2015
    Co-Authors: Myoung-ah Kang, François Pinet, Mehdi Zaamoune, Sandro Bimonte, Philippe Beaune
    Abstract:

    The problem of storage and querying of large volumes of spatial grids is an issue to solve. In this paper, we propose a method to optimize queries to aggregate raster grids stored in Databases. In our approach, we propose to estimate the exact result rather than calculate the exact result. This approach reduces query execution time. One advantage of our method is that it does not require implementing or modifying functionalities of Database management systems. Our approach is based on a new Data structure and a specific model of SQL queries. Our work is applied here to relational Data Warehouses.