Pay-as-You-Go

The Experts below are selected from a list of 13530 Experts worldwide ranked by ideXlab platform

Suzanne Embury - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

Lecture Notes in Computer Science, 2016

Co-Authors: Ruhaila Maskat, Norman W. Paton, Suzanne Embury

Abstract:

Entity resolution, which seeks to identify records that represent the same entity, is an important step in many data integration and data cleaning applications. However, entity resolution is challenging both in terms of scalability all-against-all comparisons are computationally impractical and result quality syntactic evidence on record equivalence is often equivocal. As a result, end-to-end entity resolution proposals involve several stages, including blocking to efficiently identify candidate duplicates, detailed comparison to refine the conclusions from blocking, and clustering to identify the sets of records that may represent the same entity. However, the quality of the result is often crucially dependent on configuration parameters in all of these stages, for which it may be difficult for a human expert to provide suitable values. This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback such as might be obtained from crowds on candidate duplicates. Given such feedback, an evolutionary search of the space of configuration parameters is carried out, with a view to maximizing the fitness of the resulting clusters. The approach is Pay-as-You-Go in that more feedback can be expected to give rise to better outcomes. An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback.

15 days free trial to Access Article
SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Lecture Notes in Computer Science, 2016

Co-Authors: Norman W. Paton, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Ruhaila Maskat

Abstract:

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual with tool support construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-You-Go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-You-Go data integration tends to involve two steps. Initialisation: automatic creation of mappings generally of poor quality between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with Pay-as-You-Go data integration, providing a framework that can be used to compare or develop Pay-as-You-Go data integration techniques.

15 days free trial to Access Article
Pay-as-You-Go Data Integration: Experiences and Recurring Themes

2016

Co-Authors: Norman Paton, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Ruhaila Maskat

Abstract:

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual (with tool support) construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-You-Go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-You-Go data integration tends to involve two steps. Initialisation: automatic creation of mappings (generally of poor quality) between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with Pay-as-You-Go data integration, providing a framework that can be used to compare or develop Pay-as-You-Go data integration techniques.

15 days free trial to Access Article
SIGMOD Conference - Pay-as-You-Go mapping selection in dataspaces

Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011

Co-Authors: Cornelia Hedeler, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Norman W. Paton, Lu Mao, Chenjuan Guo

Abstract:

The vision of dataspaces proposes an alternative to classical data integration approaches with reduced up-front costs followed by incremental improvement on a Pay-as-You-Go basis. In this paper, we demonstrate DSToolkit, a system that allows users to provide feedback on results of queries posed over an integration schema. Such feedback is then used to annotate the mappings with their respective precision and recall. The system then allows a user to state the expected levels of precision (or recall) that the query results should exhibit and, in order to produce those results, the system selects those mappings that are predicted to meet the stated constraints.

15 days free trial to Access Article

Norman W. Paton - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

Lecture Notes in Computer Science, 2016

Co-Authors: Ruhaila Maskat, Norman W. Paton, Suzanne Embury

Abstract:

Entity resolution, which seeks to identify records that represent the same entity, is an important step in many data integration and data cleaning applications. However, entity resolution is challenging both in terms of scalability all-against-all comparisons are computationally impractical and result quality syntactic evidence on record equivalence is often equivocal. As a result, end-to-end entity resolution proposals involve several stages, including blocking to efficiently identify candidate duplicates, detailed comparison to refine the conclusions from blocking, and clustering to identify the sets of records that may represent the same entity. However, the quality of the result is often crucially dependent on configuration parameters in all of these stages, for which it may be difficult for a human expert to provide suitable values. This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback such as might be obtained from crowds on candidate duplicates. Given such feedback, an evolutionary search of the space of configuration parameters is carried out, with a view to maximizing the fitness of the resulting clusters. The approach is Pay-as-You-Go in that more feedback can be expected to give rise to better outcomes. An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback.

15 days free trial to Access Article
SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Lecture Notes in Computer Science, 2016

Co-Authors: Norman W. Paton, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Ruhaila Maskat

Abstract:

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual with tool support construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-You-Go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-You-Go data integration tends to involve two steps. Initialisation: automatic creation of mappings generally of poor quality between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with Pay-as-You-Go data integration, providing a framework that can be used to compare or develop Pay-as-You-Go data integration techniques.

15 days free trial to Access Article
SWIM - Pay-as-You-Go data integration for linked data: opportunities, challenges and architectures

Proceedings of the 4th International Workshop on Semantic Web Information Management - SWIM '12, 2012

Co-Authors: Norman W. Paton, Alvaro A. A. Fernandes, Klitos Christodoulou, Bijan Parsia, Cornelia Hedeler

Abstract:

Linked Data (LD) provides principles for publishing data that underpin the development of an emerging web of data. LD follows the web in providing low barriers to entry: publishers can make their data available using a small set of standard technologies, and consumers can search for and browse published data using generic tools. Like the web, consumers frequently consume data in broadly the form in which it was published; this will be satisfactory in some cases, but the diversity of publishers means that the data required to support a task may be stored in many different sources, and described in many different ways. As such, although RDF provides a syntactically homogeneous language for describing data, sources typically manifest a wide range of heterogeneities, in terms of how data on a concept is represented. This paper makes the case that many aspects of both publication and consumption of LD stand to benefit from a Pay-as-You-Go approach to data integration. Specifically, the paper: (i) identifies a collection of opportunities for applying Pay-as-You-Go techniques to LD; (ii) describes some preliminary experiences applying a Pay-as-You-Go data integration system to LD; and (iii) presents some open issues that need to be addressed to enable the full benefits of pay-as-you go integration to be realised.

15 days free trial to Access Article
SIGMOD Conference - Pay-as-You-Go mapping selection in dataspaces

Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011

Co-Authors: Cornelia Hedeler, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Norman W. Paton, Lu Mao, Chenjuan Guo

Abstract:

The vision of dataspaces proposes an alternative to classical data integration approaches with reduced up-front costs followed by incremental improvement on a Pay-as-You-Go basis. In this paper, we demonstrate DSToolkit, a system that allows users to provide feedback on results of queries posed over an integration schema. Such feedback is then used to annotate the mappings with their respective precision and recall. The system then allows a user to state the expected levels of precision (or recall) that the query results should exhibit and, in order to produce those results, the system selects those mappings that are predicted to meet the stated constraints.

15 days free trial to Access Article

Ruhaila Maskat - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

Lecture Notes in Computer Science, 2016

Co-Authors: Ruhaila Maskat, Norman W. Paton, Suzanne Embury

Abstract:

Entity resolution, which seeks to identify records that represent the same entity, is an important step in many data integration and data cleaning applications. However, entity resolution is challenging both in terms of scalability all-against-all comparisons are computationally impractical and result quality syntactic evidence on record equivalence is often equivocal. As a result, end-to-end entity resolution proposals involve several stages, including blocking to efficiently identify candidate duplicates, detailed comparison to refine the conclusions from blocking, and clustering to identify the sets of records that may represent the same entity. However, the quality of the result is often crucially dependent on configuration parameters in all of these stages, for which it may be difficult for a human expert to provide suitable values. This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback such as might be obtained from crowds on candidate duplicates. Given such feedback, an evolutionary search of the space of configuration parameters is carried out, with a view to maximizing the fitness of the resulting clusters. The approach is Pay-as-You-Go in that more feedback can be expected to give rise to better outcomes. An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback.

15 days free trial to Access Article
SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Lecture Notes in Computer Science, 2016

Co-Authors: Norman W. Paton, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Ruhaila Maskat

Abstract:

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual with tool support construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-You-Go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-You-Go data integration tends to involve two steps. Initialisation: automatic creation of mappings generally of poor quality between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with Pay-as-You-Go data integration, providing a framework that can be used to compare or develop Pay-as-You-Go data integration techniques.

15 days free trial to Access Article
Pay-as-You-Go Data Integration: Experiences and Recurring Themes

2016

Co-Authors: Norman Paton, Suzanne Embury, Alvaro A. A. Fernandes, Khalid Belhajjame, Ruhaila Maskat

Abstract:

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual (with tool support) construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-You-Go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-You-Go data integration tends to involve two steps. Initialisation: automatic creation of mappings (generally of poor quality) between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with Pay-as-You-Go data integration, providing a framework that can be used to compare or develop Pay-as-You-Go data integration techniques.

15 days free trial to Access Article

Alon Halevy - One of the best experts on this subject based on the ideXlab platform.

functional dependency generation and applications in pay as you go data integration systems

International Workshop on the Web and Databases, 2009

Co-Authors: Daisy Zhe Wang, Anish Das Sarma, Michael J. Franklin, Xin Luna Dong, Alon Halevy

Abstract:

Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style HTML tables can be extracted from the Web. Traditional data integration approaches do not scale over such corpora with hundreds of small tables in one domain. To solve this problem, previous work has proposed Pay-as-You-Go data integration systems to provide, with little up-front cost, base services over loosely-integrated information. One key component of such systems, which has received little attention to date, is the need for a framework to gauge and improve the quality of the integration. We propose a framework based on functional dependencies(FDs). Unlike in traditional database design, where FDs are specified as statements of truth about all possible instances of the database; in web environment, FDs are not specified over the data tables. Instead, we generate FDs by counting-based algorithms over many data sources, and extend the FDs with probabilities to capture the inherent uncertainties in them. Given these probabilistic FDs, we show how to solve two problems to improve data and schema quality in a Pay-as-You-Go system: (1) pinpointing dirty data sources and (2) normalizing large mediated schemas. We describe these techniques and evaluate them over real-world data sets extracted from the Web.

15 days free trial to Access Article
Discovering Functional Dependencies in Pay-As-You- Go Data Integration Systems

2009

Co-Authors: Daisy Zhe Wang, Anish Das Sarma, Michael J. Franklin, Luna Dong, Alon Halevy

Abstract:

Functional dependency is one of the most extensively researched subjects in database theory, originally for improving quality of schemas, and recently for improving quality of data. In a payas-you-go data integration system, where the goal is to provide best-effort service even without thorough understanding of the underlying domain and the various data sources, functional dependency can play an even more important role, applied in normalizing an automatically generated mediated schema, pinpointing sources of low quality, resolving conflicts in data from different sources, improving efficiency of query answering, and so on. Despite its importance, discovering functional dependencies in such a context is challenging: we cannot assume upfront domain knowledge for specifying dependencies, and the data can be dirty, incomplete, or even misinterpreted, so make automatic discovery of dependencies hard. This paper studies how one can automatically discover functional dependencies in a Pay-as-You-Go data integration system. We introduce the notion of probabilistic functional dependencies (pFDs) and design Bayes models that compute probabilities of dependencies according to data from various sources. As an application, we study how to normalize a mediated schema based on the pFDs we generate. Experiments on real-world data sets with tens or hundreds of data sources show that our techniques obtain high precision and recall in dependency discovery and generate high-quality results in mediated-schema normalization.

15 days free trial to Access Article
bootstrapping pay as you go data integration systems

International Conference on Management of Data, 2008

Co-Authors: Anish Das Sarma, Xin Dong, Alon Halevy

Abstract:

Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a Pay-as-You-Go approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary. This paper describes the first completely self-configuring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a Pay-as-You-Go system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50-800 data sources, and show that our system is able to produce high-quality answers with no human intervention.

15 days free trial to Access Article
pay as you go user feedback for dataspace systems

International Conference on Management of Data, 2008

Co-Authors: Shawn R. Jeffery, Michael J. Franklin, Alon Halevy

Abstract:

A primary challenge to large-scale data integration is creating semantic equivalences between elements from different data sources that correspond to the same real-world entity or concept. Dataspaces propose a Pay-as-You-Go approach: automated mechanisms such as schema matching and reference reconciliation provide initial correspondences, termed candidate matches, and then user feedback is used to incrementally confirm these matches. The key to this approach is to determine in what order to solicit user feedback for confirming candidate matches. In this paper, we develop a decision-theoretic framework for ordering candidate matches for user confirmation using the concept of the value of perfect information (VPI). At the core of this concept is a utility function that quantifies the desirability of a given state; thus, we devise a utility function for dataspaces based on query result quality. We show in practice how to efficiently apply VPI in concert with this utility function to order user confirmations. A detailed experimental evaluation on both real and synthetic datasets shows that the ordering of user feedback produced by this VPI-based approach yields a dataspace with a significantly higher utility than a wide range of other ordering strategies. Finally, we outline the design of Roomba, a system that utilizes this decision-theoretic framework to guide a dataspace in soliciting user feedback in a Pay-as-You-Go manner.

15 days free trial to Access Article
SIGMOD Conference - Bootstrapping Pay-as-You-Go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08, 2008

Co-Authors: Anish Das Sarma, Xin Dong, Alon Halevy

Abstract:

Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a Pay-as-You-Go approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary. This paper describes the first completely self-configuring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a Pay-as-You-Go system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50-800 data sources, and show that our system is able to produce high-quality answers with no human intervention.

15 days free trial to Access Article

Øystein Thøgersen - One of the best experts on this subject based on the ideXlab platform.

A note on intergenerational risk sharing and the design of Pay-as-You-Go pension programs.

Journal of population economics, 1998

Co-Authors: Øystein Thøgersen

Abstract:

Different versions of Pay-as-You-Go public pension programs may have entirely different effects on the intergenerational distribution of income risk. If the pension benefit is a fixed proportion of previous labor income, a Pay-as-You-Go program increases the net income risk of all generations. On the other hand, a Pay-as-You-Go program characterized by a fixed labor income tax rate and uncertain pension benefits provides intergenerational risk sharing.

15 days free trial to Access Article
Intergenerational Risk Sharing and the Design of Pay-as-You-Go Pension Programs

1996

Co-Authors: Øystein Thøgersen

Abstract:

Different versions of Pay-as-You-Go public pension programs may have entirely different effects on the intergenerational distribution of income risk. If the pension benefit is a fixed proportion of previous income, a Pay-as-You-Go program increases the income risk of all generations. On the other hand, a Pay-as-You-Go program characterized by a fixed inome tax rate and uncertain pension benefits provides intergenerational risk sharing.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Suzanne Embury - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Pay-as-You-Go Data Integration: Experiences and Recurring Themes

SIGMOD Conference - Pay-as-You-Go mapping selection in dataspaces

Norman W. Paton - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

SWIM - Pay-as-You-Go data integration for linked data: opportunities, challenges and architectures

SIGMOD Conference - Pay-as-You-Go mapping selection in dataspaces

Ruhaila Maskat - One of the best experts on this subject based on the ideXlab platform.

Pay-as-You-Go Configuration of Entity Resolution

SOFSEM - Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Pay-as-You-Go Data Integration: Experiences and Recurring Themes

Alon Halevy - One of the best experts on this subject based on the ideXlab platform.

functional dependency generation and applications in pay as you go data integration systems

Discovering Functional Dependencies in Pay-As-You- Go Data Integration Systems

bootstrapping pay as you go data integration systems

pay as you go user feedback for dataspace systems

SIGMOD Conference - Bootstrapping Pay-as-You-Go data integration systems

Øystein Thøgersen - One of the best experts on this subject based on the ideXlab platform.

A note on intergenerational risk sharing and the design of Pay-as-You-Go pension programs.

Intergenerational Risk Sharing and the Design of Pay-as-You-Go Pension Programs

Pay-as-You-Go

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Suzanne Embury - One of the best experts on this subject based on the ideXlab platform.

Norman W. Paton - One of the best experts on this subject based on the ideXlab platform.

Ruhaila Maskat - One of the best experts on this subject based on the ideXlab platform.

Alon Halevy - One of the best experts on this subject based on the ideXlab platform.

Øystein Thøgersen - One of the best experts on this subject based on the ideXlab platform.

Related terms