Data Lineage

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 90 Experts worldwide ranked by ideXlab platform

E Valentijn - One of the best experts on this subject based on the ideXlab platform.

  • Tracing and using Data Lineage for pipeline processing in Astro-WISE
    Experimental Astronomy, 2013
    Co-Authors: Johnson Mwebaze, Danny Boxhoorn, E Valentijn
    Abstract:

    Most workflow systems that support Data provenance primarily focus on tracing Lineage of Data. Data provenance by Data Lineage provides the derivation history of Data including information about services and input Data that contributed to the creation of a Data product. We show that tracing Lineage by means of full backward chaining not only enables users to share, discover and reuse the Data, but also supports scientific Data processing through storage, retrieval and (re)processing of digitized scientific Data. In this paper, we present Astro-WISE, a distributed system for processing, analyzing and disseminating wide field imaging astronomical Data. We show how Astro-WISE traces Lineage of Data and how it facilitates Data processing, retrieval, storage and archiving. Particularly we show how it solves issues related to the changing Data items typical for the scientific environment, such as physical changes in calibrations, our insight in these changes and improved methods for deriving results.

  • Astro-WISE: Tracing and using Lineage for scientific Data processing
    NBiS 2009 - 12th International Conference on Network-Based Information Systems, 2009
    Co-Authors: Johnson Mwebaze, Danny Boxhoorn, E Valentijn
    Abstract:

    Most workflow systems that support Data provenance primarily focus on tracing Lineage of Data. Data provenance by Data Lineage provides the derivation history of Data including information about services and input Data that contributed to the creation of a Data product. We show that tracing Lineage by means of full backward chaining not only enables users to share, discover and reuse the Data, but also supports scientific Data processing through storage, retrieval and (re)processing of digitized scientific Data. In this paper, we present Astro-WISE, a distributed system for processing, analyzing and disseminating wide field imaging astronomical Data. We show how Astro-WISE traces Lineage of Data and how it facilitates Data processing, retrieval, storage, archiving. Particularly we show how it solves issues related to the changing Data items typical for the scientific environment, such as physical changes in calibrations, our insight in these changes and improved methods for deriving results.

Arunprasad P. Marathe - One of the best experts on this subject based on the ideXlab platform.

  • tracing Lineage of array Data
    Statistical and Scientific Database Management, 2001
    Co-Authors: Arunprasad P. Marathe
    Abstract:

    Arrays are a common and important class of Data. They can model digital images, digital video, scientific and experimentation Data, matrices, finite element grids, and many other types of Data. Although array manipulations are diverse and domain-specific, they often exhibit structural regularities. The paper presents an algorithm called SUN-pushdown to compute Data Lineage in such array computations. The array manipulations are expressed in the Array Manipulation Language (AML) that was introduced previously (A.P. Marathe and K. Salem, 1997). SUB-pushdown has several useful features. First, the Lineage computation is expressed as an AML query. Second, it is not necessary to evaluate the AML Lineage query to compute the array Data Lineage. Third, SUB-pushdown never gives false-negative answers. SUB-pushdown has been implemented as part of the ArrayDB prototype array Database system that we built (A.P. Marathe, 2001).

  • Tracing Lineage of array Data
    Journal of Intelligent Information Systems, 2001
    Co-Authors: Arunprasad P. Marathe
    Abstract:

    Arrays are a common and important class of Data in many applications. Arrays can model Data such as digital images, digital video, scientific and experimental Data, matrices, and finite element grids. Although array manipulations are diverse and domain-specific, they often exhibit structural regularities. This paper describes an algorithm called SUB-pushdown to trace Data Lineage in such array computations. Lineage tracing is a type of Data-flow analysis that relates parts of a result array to those parts of the argument (base) arrays that have bearings on the result array parts. Sub-pushdown can be used to trace Data Lineage in array-manipulating computations expressed in the Array Manipulation Language (AML) that was introduced previously. Sub-pushdown has several useful features. First, the Lineage computation is expressed as an AML query. Second, it is not necessary to evaluate the AML Lineage query to compute the array Data Lineage. Third, Sub-pushdown never gives false-negative answers. Sub-pushdown has been implemented as part of the ArrayDB prototype array Database system that we have built.

Johnson Mwebaze - One of the best experts on this subject based on the ideXlab platform.

  • Tracing and using Data Lineage for pipeline processing in Astro-WISE
    Experimental Astronomy, 2013
    Co-Authors: Johnson Mwebaze, Danny Boxhoorn, E Valentijn
    Abstract:

    Most workflow systems that support Data provenance primarily focus on tracing Lineage of Data. Data provenance by Data Lineage provides the derivation history of Data including information about services and input Data that contributed to the creation of a Data product. We show that tracing Lineage by means of full backward chaining not only enables users to share, discover and reuse the Data, but also supports scientific Data processing through storage, retrieval and (re)processing of digitized scientific Data. In this paper, we present Astro-WISE, a distributed system for processing, analyzing and disseminating wide field imaging astronomical Data. We show how Astro-WISE traces Lineage of Data and how it facilitates Data processing, retrieval, storage and archiving. Particularly we show how it solves issues related to the changing Data items typical for the scientific environment, such as physical changes in calibrations, our insight in these changes and improved methods for deriving results.

  • Astro-WISE: Tracing and using Lineage for scientific Data processing
    NBiS 2009 - 12th International Conference on Network-Based Information Systems, 2009
    Co-Authors: Johnson Mwebaze, Danny Boxhoorn, E Valentijn
    Abstract:

    Most workflow systems that support Data provenance primarily focus on tracing Lineage of Data. Data provenance by Data Lineage provides the derivation history of Data including information about services and input Data that contributed to the creation of a Data product. We show that tracing Lineage by means of full backward chaining not only enables users to share, discover and reuse the Data, but also supports scientific Data processing through storage, retrieval and (re)processing of digitized scientific Data. In this paper, we present Astro-WISE, a distributed system for processing, analyzing and disseminating wide field imaging astronomical Data. We show how Astro-WISE traces Lineage of Data and how it facilitates Data processing, retrieval, storage, archiving. Particularly we show how it solves issues related to the changing Data items typical for the scientific environment, such as physical changes in calibrations, our insight in these changes and improved methods for deriving results.

Lingjun Kang - One of the best experts on this subject based on the ideXlab platform.

  • Implementation of Geospatial Data Provenance in a Web Service Workflow Environment With ISO 19115 and ISO 19115-2 Lineage Model
    IEEE Transactions on Geoscience and Remote Sensing, 2013
    Co-Authors: Liping Di, Yuanzheng Shao, Lingjun Kang
    Abstract:

    Data provenance, also called Data Lineage, records the derivation history of a Data product. In the earth science domain, geospatial Data provenance is important because it plays a significant role in Data quality and usability evaluation, Data trail audition, workflow replication, and product reproducibility. The generation of the geospatial provenance metaData is usually coupled with the execution of geo-processing workflow. Their symbiotic relationship makes them complementary to each other and promises great benefit once they are integrated. However, the heterogeneity of Data and computing resources in the distributed environment constructed under the service-oriented architecture (SOA) brings a great challenge to resource integration. Specifically, the issues, such as the lack of interoperability and compatibility among provenance metaData models and between provenance and workflow, create obstacles for the integration of provenance, and geo-processing workflow. In order to tackle these issues, on one hand, this paper breaks the provenance heterogeneity through recording provenance information in a standard Lineage model defined in ISO 19115:2003 and ISO 19115-2:2009 standards. On the other hand, this paper bridges the gap between provenance and geo-processing workflow through extending both workflow language and service interface, making it possible for the automatic capture of provenance information in the geospatial web service environment. The proposed method is implemented in the GeoBrain, a SOA-based geospatial web service system. The testing result from implementation shows that the geospatial provenance information is successfully captured throughout the life cycle of geo-processing workflows and properly recorded in the ISO standard Lineage model.

Alexandra Poulovassilis - One of the best experts on this subject based on the ideXlab platform.

  • Using schema transformation pathways for Data Lineage tracing
    Knowledge transformation for the Semantic Web, 2010
    Co-Authors: Hao Fan, Alexandra Poulovassilis
    Abstract:

    With the increasing amount and diversity of information available on the Internet, there has been a huge growth in information systems that need to integrate Data from distributed, heterogeneous Data sources. Tracing the Lineage of the integrated Data is one of the problems being addressed in Data warehousing research. This paper presents a Data Lineage tracing approach based on schema transformation pathways. Our approach is not limited to one specific Data model or query language, and would be useful in any Data transformation/integration framework based on sequences of primitive schema transformations.

  • tracing Data Lineage using schema transformation pathways
    Knowledge Transformation for the Semantic Web, 2003
    Co-Authors: Alexandra Poulovassilis
    Abstract:

    With the increasing amount and diversity of information available on the Internet, there has been a huge growth in information systems that need to integrate Data from distributed, heterogeneous Data sources. Tracing the Lineage of the integrated Data is one of the current problems being addressed in Data warehouse research. In this chapter, we propose a new approach for tracing Data Lineage which is based on schema transformation pathways. We show how the individual transformation steps in a transformation pathway can be used to trace the derivation of the integrated Data in a step-wise fashion. Although developed for a graph-based common Data model and a functional query language, our approach is not limited to these and would be useful in any Data transformation/integration framework based on sequences of primitive schema transformations.