Data Warehouse Project

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 219 Experts worldwide ranked by ideXlab platform

Alkis Simitsis - One of the best experts on this subject based on the ideXlab platform.

  • ontology driven conceptual design of etl processes using graph transformations
    Journal on Data Semantics, 2009
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis, Timos Sellis
    Abstract:

    One of the main tasks during the early steps of a Data Warehouse Project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target Data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the Data Warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target Data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the Data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.

  • a method for the mapping of conceptual designs to logical blueprints for etl processes
    Decision Support Systems, 2008
    Co-Authors: Alkis Simitsis, Panos Vassiliadis
    Abstract:

    Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of Data from several sources, their cleansing, customization and insertion into a Data Warehouse. In previous work, we presented a modeling framework for ETL processes comprised of a conceptual model that concretely deals with the early stages of a Data Warehouse Project, and a logical model that deals with the definition of Data-centric workflows. In this paper, we describe the mapping of the conceptual model to the logical model. First, we identify how conceptual entities are mapped to logical entities. Next, we determine the execution order in the logical workflow using information adapted from the conceptual model. Finally, we provide a method for the transition from the conceptual model to the logical model.

  • DOLAP - Natural language reporting for ETL processes
    Proceeding of the ACM 11th international workshop on Data warehousing and OLAP - DOLAP '08, 2008
    Co-Authors: Alkis Simitsis, Dimitrios Skoutas, Malu Castellanos
    Abstract:

    The conceptual design of the Extract -- Transform -- Load (ETL) processes is a crucial, burdensome, and challenging procedure that takes places at the early phases of a Data Warehouse Project. Several models have been proposed for the conceptual design and representation of ETL processes, but all share two inconveniences: they require intensive human effort from the designers to create them, as well as technical knowledge from the business people to understand them. In a previous work, we have relaxed the former difficulty by working on the automation of the conceptual design leveraging Semantic Web technology. In this paper, we built upon our previous results and we tackle the second issue by investigating the application of natural language generation techniques to the ETL environment. In particular, we provide a method for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require knowledge of any specific model. We discuss how linguistic techniques can be used for the establishment of a common application vocabulary. Finally, we present a flexible and customizable template-based mechanism for generating natural language representations for the ETL process requirements and operations.

  • ontology based conceptual design of etl processes for both structured and semi structured Data
    International Journal on Semantic Web and Information Systems, 2007
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis
    Abstract:

    One of the main tasks in the early stages of a Data Warehouse Project is the identification of the appropriate transformations and the specification of inter-schema mappings from the Data sources to the Data Warehouse. In this article, we propose an ontology-based approach to facilitate the conceptual design of the back stage of a Data Warehouse. A graph-based representation is used as a conceptual model for the Datastores, so that both structured and semi-structured Data are supported and handled in a uniform way. The proposed approach is based on the use of Semantic Web technologies to semantically annotate the Data sources and the Data Warehouse, so that mappings between them can be inferred, thereby resolving the issue of heterogeneity. Specifically, a suitable application ontology is created and used to annotate the Datastores. The language used for describing the ontology is OWL-DL. Based on the provided annotations, a DL reasoner is employed to infer semantic correspondences and conflicts among the Datastores, and to propose a set of conceptual operations for transforming Data from the source Datastores to the Data Warehouse.

  • designing etl processes using semantic web technologies
    Data Warehousing and OLAP, 2006
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis
    Abstract:

    One of the most important tasks performed in the early stages of a Data Warehouse Project is the analysis of the structure and content of the existing Data sources and their intentional mapping to a common Data model. Establishing the appropriate mappings between the attributes of the Data sources and the attributes of the Data Warehouse tables is critical in specifying the required transformations in an ETL workflow. The selected Data model should besuitable for facilitating the redefinition and revision efforts, typically occurring during the early phases of a Data Warehouse Project, and serve as the means of communication between the involved parties. In this paper, we argue that ontologies constitute a very suitable model for this purpose and show how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design.

Dimitrios Skoutas - One of the best experts on this subject based on the ideXlab platform.

  • ontology driven conceptual design of etl processes using graph transformations
    Journal on Data Semantics, 2009
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis, Timos Sellis
    Abstract:

    One of the main tasks during the early steps of a Data Warehouse Project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target Data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the Data Warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target Data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the Data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.

  • DOLAP - Natural language reporting for ETL processes
    Proceeding of the ACM 11th international workshop on Data warehousing and OLAP - DOLAP '08, 2008
    Co-Authors: Alkis Simitsis, Dimitrios Skoutas, Malu Castellanos
    Abstract:

    The conceptual design of the Extract -- Transform -- Load (ETL) processes is a crucial, burdensome, and challenging procedure that takes places at the early phases of a Data Warehouse Project. Several models have been proposed for the conceptual design and representation of ETL processes, but all share two inconveniences: they require intensive human effort from the designers to create them, as well as technical knowledge from the business people to understand them. In a previous work, we have relaxed the former difficulty by working on the automation of the conceptual design leveraging Semantic Web technology. In this paper, we built upon our previous results and we tackle the second issue by investigating the application of natural language generation techniques to the ETL environment. In particular, we provide a method for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require knowledge of any specific model. We discuss how linguistic techniques can be used for the establishment of a common application vocabulary. Finally, we present a flexible and customizable template-based mechanism for generating natural language representations for the ETL process requirements and operations.

  • ontology based conceptual design of etl processes for both structured and semi structured Data
    International Journal on Semantic Web and Information Systems, 2007
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis
    Abstract:

    One of the main tasks in the early stages of a Data Warehouse Project is the identification of the appropriate transformations and the specification of inter-schema mappings from the Data sources to the Data Warehouse. In this article, we propose an ontology-based approach to facilitate the conceptual design of the back stage of a Data Warehouse. A graph-based representation is used as a conceptual model for the Datastores, so that both structured and semi-structured Data are supported and handled in a uniform way. The proposed approach is based on the use of Semantic Web technologies to semantically annotate the Data sources and the Data Warehouse, so that mappings between them can be inferred, thereby resolving the issue of heterogeneity. Specifically, a suitable application ontology is created and used to annotate the Datastores. The language used for describing the ontology is OWL-DL. Based on the provided annotations, a DL reasoner is employed to infer semantic correspondences and conflicts among the Datastores, and to propose a set of conceptual operations for transforming Data from the source Datastores to the Data Warehouse.

  • designing etl processes using semantic web technologies
    Data Warehousing and OLAP, 2006
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis
    Abstract:

    One of the most important tasks performed in the early stages of a Data Warehouse Project is the analysis of the structure and content of the existing Data sources and their intentional mapping to a common Data model. Establishing the appropriate mappings between the attributes of the Data sources and the attributes of the Data Warehouse tables is critical in specifying the required transformations in an ETL workflow. The selected Data model should besuitable for facilitating the redefinition and revision efforts, typically occurring during the early phases of a Data Warehouse Project, and serve as the means of communication between the involved parties. In this paper, we argue that ontologies constitute a very suitable model for this purpose and show how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design.

  • DOLAP - Designing ETL processes using semantic web technologies
    Proceedings of the 9th ACM international workshop on Data warehousing and OLAP - DOLAP '06, 2006
    Co-Authors: Dimitrios Skoutas, Alkis Simitsis
    Abstract:

    One of the most important tasks performed in the early stages of a Data Warehouse Project is the analysis of the structure and content of the existing Data sources and their intentional mapping to a common Data model. Establishing the appropriate mappings between the attributes of the Data sources and the attributes of the Data Warehouse tables is critical in specifying the required transformations in an ETL workflow. The selected Data model should besuitable for facilitating the redefinition and revision efforts, typically occurring during the early phases of a Data Warehouse Project, and serve as the means of communication between the involved parties. In this paper, we argue that ontologies constitute a very suitable model for this purpose and show how the usage of ontologies can enable a high degree of automation regarding the construction of an ETL design.

Jacky Akoka - One of the best experts on this subject based on the ideXlab platform.

  • Combining objects with rules to represent aggregation knowledge in Data Warehouse and OLAP systems
    Data and Knowledge Engineering, 2011
    Co-Authors: Nicolas Prat, Isabelle Comyn-wattiau, Jacky Akoka
    Abstract:

    Data Warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional Data. Typically, users need to analyze Data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in the Production Rule Representation language (PRR). Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge enables an early modeling of user requirements in a Data Warehouse Project. A prototype has been developed based on the Java Expert System Shell (Jess).

  • Combining objects with rules to represent aggregation knowledge in Data Warehouse and OLAP systems
    2009
    Co-Authors: Nicolas Prat, Isabelle Comyn-wattiau, Jacky Akoka
    Abstract:

    Data Warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional Data. Typically, users need to analyze Data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in Production Rule Representation (PRR) language. Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge allows an early modeling of user requirements in a Data Warehouse Project.

Matteo Golfarelli - One of the best experts on this subject based on the ideXlab platform.

  • DaWaK - Sprint planning optimization in agile Data Warehouse design
    Data Warehousing and Knowledge Discovery, 2012
    Co-Authors: Matteo Golfarelli, Stefano Rizzi, Elisa Turricchia
    Abstract:

    Agile methods have been increasingly adopted to make Data Warehouse design faster and nimbler. They divide a Data Warehouse Project into sprints (iterations), and include a sprint planning phase that is critical to ensure the Project success. Several factors impact on the optimality of a sprint plan, e.g., the estimated complexity, business value, and affinity of the elemental functionalities included in each sprint, which makes the planning problem difficult. In this paper we formalize the planning problem and propose an optimization model that, given the estimates made by the Project team and a set of development constraints, produces an optimal sprint plan that maximizes the business value perceived by users. The planning problem is converted into a multi-knapsack problem with constraints, given a linear programming formulation, and solved using the IBM ILOG CPLEX Optimizer. Finally, the proposed approach is validated through effectiveness and efficiency tests.

  • DaWaK - Modern software engineering methodologies meet Data Warehouse design: 4WD
    Data Warehousing and Knowledge Discovery, 2011
    Co-Authors: Matteo Golfarelli, Stefano Rizzi, Elisa Turricchia
    Abstract:

    Data Warehouse systems are characterized by a long and expensive development process that hardly meets the ambitious requirements of today's market. This suggests that some further investigation on the methodological issues related to Data Warehouse design is necessary, aimed at improving the development process from different points of view. In this paper we analyze the potential advantages arising from the application of modern software engineering methodologies to a Data Warehouse Project and we propose 4WD, a design methodology that couples the main principles emerging from these methodologies to the peculiarities of Data Warehouse Projects. The principles underlying 4WD are risk-based iteration, evolutionary and incremental prototyping, user involvement, component reuse, formal and light documentation, and automated schema transformation.

  • From User Requirements to Conceptual Design in Warehouse Design: A Survey
    Data Warehousing Design and Advanced Engineering Applications, 1
    Co-Authors: Matteo Golfarelli
    Abstract:

    Conceptual design and requirement analysis are two of the key steps within the Data Warehouse design process. They are to a great extent responsible for the success of a Data Warehouse Project since, during these two phases, the expressivity of the multidimensional schemata is completely defined. This paper proposes a survey of the literature related to these design steps and points out pros and cons of the different techniques in order to help the reader to identify crucial choices and possible solutions more consciously. Particular attention will be devoted to emphasizing the relationships between the two steps describing how they can be jointly used fruitfully.

Juan Trujillo - One of the best experts on this subject based on the ideXlab platform.

  • DaWaK - Extending the UML for designing association rule mining models for Data Warehouses
    Data Warehousing and Knowledge Discovery, 2005
    Co-Authors: Jose Zubcoff, Juan Trujillo
    Abstract:

    Association rules (AR) are one of the most popular Data mining techniques in searching Databases for frequently occurring patterns. In this paper, we present a novel approach to accomplish the conceptual design of Data Warehouses together with Data mining association rules, allowing us to implement the association rules defined in the conceptual modeling phase. The great advantage of our approach is that the association rules are specified from the early stages of a Data Warehouse Project and based on the main final user requirements and Data Warehouse goals, instead of specifying them on the final Database implementation structures such as tables, rows or columns. Finally, to show the benefit of our approach we implement the specified association rules on a commercial Data Warehouse management server.