Data Integration Process

The Experts below are selected from a list of 150195 Experts worldwide ranked by ideXlab platform

Markus Helfert - One of the best experts on this subject based on the ideXlab platform.

Data quality problems in TPC-DI based Data Integration Processes

2018

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Many Data driven organisations need to integrate Data from multiple, distributed and heterogeneous resources for advanced Data analysis. A Data Integration system is an essential component to collect Data into a Data warehouse or other Data analytics systems. There are various alternatives of Data Integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating Data Integration systems. When using this benchmark, we find some typical Data quality problems in the TPC-DI Data source such as multi-meaning attributes and inconsistent Data schemas, which could delay or even fail the Data Integration Process. This paper explains Processes of this benchmark and summarises typical Data quality problems identified in the TPC-DI Data source. Furthermore, in order to prevent Data quality problems and proactively manage Data quality, we propose a set of practical guidelines for researchers and practitioners to conduct Data quality management when using the TPC-DI benchmark.

15 days free trial to Access Article
ICEIS (Revised Selected Papers) - Data Quality Problems in TPC-DI Based Data Integration Processes

Enterprise Information Systems, 2018

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Many Data driven organisations need to integrate Data from multiple, distributed and heterogeneous resources for advanced Data analysis. A Data Integration system is an essential component to collect Data into a Data warehouse or other Data analytics systems. There are various alternatives of Data Integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating Data Integration systems. When using this benchmark, we find some typical Data quality problems in the TPC-DI Data source such as multi-meaning attributes and inconsistent Data schemas, which could delay or even fail the Data Integration Process. This paper explains Processes of this benchmark and summarises typical Data quality problems identified in the TPC-DI Data source. Furthermore, in order to prevent Data quality problems and proactively manage Data quality, we propose a set of practical guidelines for researchers and practitioners to conduct Data quality management when using the TPC-DI benchmark.

15 days free trial to Access Article
ICEIS (1) - Guildlines of Data Quality Issues for Data Integration in the Context of the TPC-DI Benchmark.

Proceedings of the 19th International Conference on Enterprise Information Systems, 2017

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Nowadays, many business intelligence or master Data management initiatives are based on regular Data Integration, since Data Integration intends to extract and combine a variety of Data sources, it is thus considered as a prerequisite for Data analytics and management. More recently, TPC-DI is proposed as an industry benchmark for Data Integration. It is designed to benchmark the Data Integration and serve as a standardisation to evaluate the ETL performance. There are a variety of Data quality problems such as multi-meaning attributes and inconsistent Data schemas in source Data, which will not only cause problems for the Data Integration Process but also affect further Data mining or Data analytics. This paper has summarised typical Data quality problems in the Data Integration and adapted the traditional Data quality dimensions to classify those Data quality problems. We found that Data completeness, timeliness and consistency are critical for Data quality management in Data Integration, and Data consistency should be further defined in the pragmatic level. In order to prevent typical Data quality problems and proactively manage Data quality in ETL, we proposed a set of practical guidelines for researchers and practitioners to conduct Data quality management in Data Integration

15 days free trial to Access Article
Guidelines of Data quality issues for Data Integration in the context of the TPC-DI benchmark

2017

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Nowadays, many business intelligence or master Data management initiatives are based on regular Data Integration, since Data Integration intends to extract and combine a variety of Data sources, it is thus considered as a prerequisite for Data analytics and management. More recently, TPC-DI is proposed as an industry benchmark for Data Integration. It is designed to benchmark the Data Integration and serve as a standardisation to evaluate the ETL performance. There are a variety of Data quality problems such as multi-meaning attributes and inconsistent Data schemas in source Data, which will not only cause problems for the Data Integration Process but also affect further Data mining or Data analytics. This paper has summarised typical Data quality problems in the Data Integration and adapted the traditional Data quality dimensions to classify those Data quality problems. We found that Data completeness, timeliness and consistency are critical for Data quality management in Data Integration, and Data consistency should be further defined in the pragmatic level. In order to prevent typical Data quality problems and proactively manage Data quality in ETL, we proposed a set of practical guidelines for researchers and practitioners to conduct Data quality management in Data Integration.

15 days free trial to Access Article

Qishan Yang - One of the best experts on this subject based on the ideXlab platform.

Data quality problems in TPC-DI based Data Integration Processes

2018

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Many Data driven organisations need to integrate Data from multiple, distributed and heterogeneous resources for advanced Data analysis. A Data Integration system is an essential component to collect Data into a Data warehouse or other Data analytics systems. There are various alternatives of Data Integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating Data Integration systems. When using this benchmark, we find some typical Data quality problems in the TPC-DI Data source such as multi-meaning attributes and inconsistent Data schemas, which could delay or even fail the Data Integration Process. This paper explains Processes of this benchmark and summarises typical Data quality problems identified in the TPC-DI Data source. Furthermore, in order to prevent Data quality problems and proactively manage Data quality, we propose a set of practical guidelines for researchers and practitioners to conduct Data quality management when using the TPC-DI benchmark.

15 days free trial to Access Article
ICEIS (Revised Selected Papers) - Data Quality Problems in TPC-DI Based Data Integration Processes

Enterprise Information Systems, 2018

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Many Data driven organisations need to integrate Data from multiple, distributed and heterogeneous resources for advanced Data analysis. A Data Integration system is an essential component to collect Data into a Data warehouse or other Data analytics systems. There are various alternatives of Data Integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating Data Integration systems. When using this benchmark, we find some typical Data quality problems in the TPC-DI Data source such as multi-meaning attributes and inconsistent Data schemas, which could delay or even fail the Data Integration Process. This paper explains Processes of this benchmark and summarises typical Data quality problems identified in the TPC-DI Data source. Furthermore, in order to prevent Data quality problems and proactively manage Data quality, we propose a set of practical guidelines for researchers and practitioners to conduct Data quality management when using the TPC-DI benchmark.

15 days free trial to Access Article
ICEIS (1) - Guildlines of Data Quality Issues for Data Integration in the Context of the TPC-DI Benchmark.

Proceedings of the 19th International Conference on Enterprise Information Systems, 2017

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Nowadays, many business intelligence or master Data management initiatives are based on regular Data Integration, since Data Integration intends to extract and combine a variety of Data sources, it is thus considered as a prerequisite for Data analytics and management. More recently, TPC-DI is proposed as an industry benchmark for Data Integration. It is designed to benchmark the Data Integration and serve as a standardisation to evaluate the ETL performance. There are a variety of Data quality problems such as multi-meaning attributes and inconsistent Data schemas in source Data, which will not only cause problems for the Data Integration Process but also affect further Data mining or Data analytics. This paper has summarised typical Data quality problems in the Data Integration and adapted the traditional Data quality dimensions to classify those Data quality problems. We found that Data completeness, timeliness and consistency are critical for Data quality management in Data Integration, and Data consistency should be further defined in the pragmatic level. In order to prevent typical Data quality problems and proactively manage Data quality in ETL, we proposed a set of practical guidelines for researchers and practitioners to conduct Data quality management in Data Integration

15 days free trial to Access Article
Guidelines of Data quality issues for Data Integration in the context of the TPC-DI benchmark

2017

Co-Authors: Qishan Yang, Markus Helfert

Abstract:

Nowadays, many business intelligence or master Data management initiatives are based on regular Data Integration, since Data Integration intends to extract and combine a variety of Data sources, it is thus considered as a prerequisite for Data analytics and management. More recently, TPC-DI is proposed as an industry benchmark for Data Integration. It is designed to benchmark the Data Integration and serve as a standardisation to evaluate the ETL performance. There are a variety of Data quality problems such as multi-meaning attributes and inconsistent Data schemas in source Data, which will not only cause problems for the Data Integration Process but also affect further Data mining or Data analytics. This paper has summarised typical Data quality problems in the Data Integration and adapted the traditional Data quality dimensions to classify those Data quality problems. We found that Data completeness, timeliness and consistency are critical for Data quality management in Data Integration, and Data consistency should be further defined in the pragmatic level. In order to prevent typical Data quality problems and proactively manage Data quality in ETL, we proposed a set of practical guidelines for researchers and practitioners to conduct Data quality management in Data Integration.

15 days free trial to Access Article

Janusz R Getta - One of the best experts on this subject based on the ideXlab platform.

query decomposition strategy for Integration of semistructured Data

Information Integration and Web-based Applications & Services, 2014

Co-Authors: Janusz R Getta

Abstract:

Data Integration systems provide a unified view of various sources of Data distributed over the wide-area networks. User requests issued at a central site must be decomposed into a number of sub-requests, that are later on Processed at the remote sites. The results are integrated at a central site and returned to a user. A decomposition strategy of global user requests and scheduling of sub-requests at a central site has a significant impact on performance of Data Integration Process. This paper proposes an efficient decomposition strategy for the systems that integrate semistructured Data. We define a new system of operations on XML documents to represent XQuery user requests and the results of decompositions of such requests. A cost-based optimisation is used to find the optimal size of sub-requests and their optimal scheduling at a central site.

15 days free trial to Access Article
iiWAS - Query Decomposition Strategy for Integration of Semistructured Data

Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services, 2014

Co-Authors: Handoko, Janusz R Getta

Abstract:

Data Integration systems provide a unified view of various sources of Data distributed over the wide-area networks. User requests issued at a central site must be decomposed into a number of sub-requests, that are later on Processed at the remote sites. The results are integrated at a central site and returned to a user. A decomposition strategy of global user requests and scheduling of sub-requests at a central site has a significant impact on performance of Data Integration Process. This paper proposes an efficient decomposition strategy for the systems that integrate semistructured Data. We define a new system of operations on XML documents to represent XQuery user requests and the results of decompositions of such requests. A cost-based optimisation is used to find the optimal size of sub-requests and their optimal scheduling at a central site.

15 days free trial to Access Article

Xiaogang Ma - One of the best experts on this subject based on the ideXlab platform.

ICDE - VisFlow: A Visual Database Integration and Workflow Querying System

2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017

Co-Authors: Hasan M. Jamil, Xiaogang Ma

Abstract:

The adoption and availability of diverse application design and support platforms are making generic scientific application orchestration increasingly difficult. In such an evolving environment, higher level abstractions of design primitives are critically important using which end users have a chance to craft their own applications without a complete technical grasp of the lower level details. In this research, we introduce a novel scientific workflow design platform that supports high level tools for Data Integration, Process description and analytics based on a visual language for naive users and advanced options for computing savvy programmers in one single platform, called VisFlow. We describe its salient features and advantages using a complex scientific application in natural resources and ecology. Video: https://youtu.be/ 2YSYVyOuuk.

15 days free trial to Access Article

Faouzia Wadjinny - One of the best experts on this subject based on the ideXlab platform.

Managing Network Dynamicity in a Vector Space Model for Semantic P2P Data Integration

Communications in Computer and Information Science, 2011

Co-Authors: Ahmed Moujane, Dalila Chiadmi, Laila Benhlima, Faouzia Wadjinny

Abstract:

P2P Data Integration is one of the prominent studies in recent years. It relies on two principal axes, including Data Integration and P2P computing. It aims to combine the advantages of Data Integration and P2P technologies to overcome centralized solutions shortcomings. However, dynamicity and large scale are the most difficult challenges faced for efficient solutions. In this paper, we investigate P2P computing and Data Integration fundamentals and detail the challenges that face the P2P Data Integration Process. In addition, we presenta vector space model based approach our P2P semantic Data Integration framework. In a first stage, we detail the various modules of our framework and specify the functions of each one. Then, we present our vector space model to represent semantic knowledge. We present also the knowledge base components that hold semantic. Finally, we explain how we deal with network dynamicity and how semantic should be adjusted accordingly.

15 days free trial to Access Article
AICCSA - A study in the P2P Data Integration Process

2009 IEEE ACS International Conference on Computer Systems and Applications, 2009

Co-Authors: Ahmed Moujane, Dalila Chiadmi, Laila Benhlima, Faouzia Wadjinny

Abstract:

In recent years, the issue of heterogeneity and Data sharing has been discussed in different contexts and according to diverse points of view. However, we can retain especially two significant axes which are Data Integration and P2P computing. Data Integration aims to hide heterogeneities of distributed sources. However, most of Data Integration solutions are centralized-based architecture. The birth of P2P technologies has changed the way of managing distributed Data and gives more scalability and flexibility. The combination of the advantages of Data Integration and P2P technologies would help to overcome centralized solutions, but the stake is not challenge-free. Thus, PDMS (Peer Data Management System) networks have a number of important advantages over previous, more flat P2P networks. In this paper, we will investigate some basic P2P computing and Data Integration notions and detail the challenges that face the P2P Data Integration Process.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Markus Helfert - One of the best experts on this subject based on the ideXlab platform.

Data quality problems in TPC-DI based Data Integration Processes

ICEIS (Revised Selected Papers) - Data Quality Problems in TPC-DI Based Data Integration Processes

ICEIS (1) - Guildlines of Data Quality Issues for Data Integration in the Context of the TPC-DI Benchmark.

Guidelines of Data quality issues for Data Integration in the context of the TPC-DI benchmark

Qishan Yang - One of the best experts on this subject based on the ideXlab platform.

Data quality problems in TPC-DI based Data Integration Processes

ICEIS (Revised Selected Papers) - Data Quality Problems in TPC-DI Based Data Integration Processes

ICEIS (1) - Guildlines of Data Quality Issues for Data Integration in the Context of the TPC-DI Benchmark.

Guidelines of Data quality issues for Data Integration in the context of the TPC-DI benchmark

Janusz R Getta - One of the best experts on this subject based on the ideXlab platform.

query decomposition strategy for Integration of semistructured Data

iiWAS - Query Decomposition Strategy for Integration of Semistructured Data

Xiaogang Ma - One of the best experts on this subject based on the ideXlab platform.

ICDE - VisFlow: A Visual Database Integration and Workflow Querying System

Faouzia Wadjinny - One of the best experts on this subject based on the ideXlab platform.

Managing Network Dynamicity in a Vector Space Model for Semantic P2P Data Integration

AICCSA - A study in the P2P Data Integration Process

Data Integration Process

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Markus Helfert - One of the best experts on this subject based on the ideXlab platform.

Qishan Yang - One of the best experts on this subject based on the ideXlab platform.

Janusz R Getta - One of the best experts on this subject based on the ideXlab platform.

Xiaogang Ma - One of the best experts on this subject based on the ideXlab platform.

Faouzia Wadjinny - One of the best experts on this subject based on the ideXlab platform.

Related terms