Data Federation - Explore the Science & Experts

The Experts below are selected from a list of 49377 Experts worldwide ranked by ideXlab platform

Stuart E Madnick - One of the best experts on this subject based on the ideXlab platform.

General Strategy for Querying Web Sources in a Data Federation Environment

Theoretical and Practical Advances in Information Systems Development, 2011

Co-Authors: Aykut Firat, Stuart E Madnick

Abstract:

Modern Database management systems are supporting the inclusion and querying of non-relational sources within a Data Federation environment via wrappers. Wrapper development for Web sources, however, is a convolution of code with extraction and query planning knowledge and becomes a daunting task. We use IBM DB2 Federation engine to demonstrate the challenges of incorporating web sources into a Data Federation. We, then, present a practical and general strategy for the inclusion and querying of web sources without requiring any changes in the underlying Data Federation technology. This strategy separates the code and knowledge in wrapper development by introducing a general-purpose capabilities-aware mini query-planner and a Data extraction engine. As a result, Web sources can be included in a Data Federation system faster, and maintained easier.

15 days free trial to Access Article
Reconciling Equational Heterogeneity Within a Data Federation

SSRN Electronic Journal, 2009

Co-Authors: Aykut Firat, Stuart E Madnick, Michael Siegel, Benjamin N. Grosof, Frank Manola

Abstract:

Mappings in most federated Databases are conceptualized and implemented as black-box transformations between source schemas and a federated schema. This approach does not allow specific mappings to be declared once and reused in other situations. We present an alternative approach, in which Data-level mappings are represented independent of source and federated schemas as a network between “contexts”. This compendious representation expedites the Data Federation process via mapping reuse and automated mapping composition from simpler mappings. We illustrate the benefits of mapping reuse and composition by using an example that incorporates equational mappings and the application of symbolic equation solving techniques.

15 days free trial to Access Article
querying web sources within a Data Federation

International Conference on Information Systems, 2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article
ICIS - Querying Web-Sources within a Data Federation

SSRN Electronic Journal, 2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article
Querying Web-Sources within a Data Federation Web-based Information Systems and Applications

2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems – which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the “request-reply-compensate” protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM’s Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article

Aykut Firat - One of the best experts on this subject based on the ideXlab platform.

General Strategy for Querying Web Sources in a Data Federation Environment

Theoretical and Practical Advances in Information Systems Development, 2011

Co-Authors: Aykut Firat, Stuart E Madnick

Abstract:

Modern Database management systems are supporting the inclusion and querying of non-relational sources within a Data Federation environment via wrappers. Wrapper development for Web sources, however, is a convolution of code with extraction and query planning knowledge and becomes a daunting task. We use IBM DB2 Federation engine to demonstrate the challenges of incorporating web sources into a Data Federation. We, then, present a practical and general strategy for the inclusion and querying of web sources without requiring any changes in the underlying Data Federation technology. This strategy separates the code and knowledge in wrapper development by introducing a general-purpose capabilities-aware mini query-planner and a Data extraction engine. As a result, Web sources can be included in a Data Federation system faster, and maintained easier.

15 days free trial to Access Article
Reconciling Equational Heterogeneity Within a Data Federation

SSRN Electronic Journal, 2009

Co-Authors: Aykut Firat, Stuart E Madnick, Michael Siegel, Benjamin N. Grosof, Frank Manola

Abstract:

Mappings in most federated Databases are conceptualized and implemented as black-box transformations between source schemas and a federated schema. This approach does not allow specific mappings to be declared once and reused in other situations. We present an alternative approach, in which Data-level mappings are represented independent of source and federated schemas as a network between “contexts”. This compendious representation expedites the Data Federation process via mapping reuse and automated mapping composition from simpler mappings. We illustrate the benefits of mapping reuse and composition by using an example that incorporates equational mappings and the application of symbolic equation solving techniques.

15 days free trial to Access Article
querying web sources within a Data Federation

International Conference on Information Systems, 2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article
ICIS - Querying Web-Sources within a Data Federation

SSRN Electronic Journal, 2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article
Querying Web-Sources within a Data Federation Web-based Information Systems and Applications

2006

Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E Madnick

Abstract:

The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems – which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the “request-reply-compensate” protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM’s Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.

15 days free trial to Access Article

Christopher J O Baker - One of the best experts on this subject based on the ideXlab platform.

Applied Ontologies for Global Health Surveillance and Pandemic Intelligence

2020

Co-Authors: Christopher J O Baker, Jon Hael Brenas, Kate Zinszer, Mohammad Sadnan Al Manir, Arash Shaban-nejad

Abstract:

AbstractGlobal health surveillance and pandemic intelligence rely on the systematic collection and integration of Data from diverse distributed and heterogeneous sources at various levels of granularity. These sources include Data from multiple disciplines represented in different formats, languages, and structures posing significant integration challenges This article provides an overview of challenges in Data driven surveillance. Using Malaria surveillance as a use case we highlight the contribution made by emerging semantic Data Federation technologies that offer enhanced interoperability, interpretability and explainability through the adoption of ontologies. The paper concludes with a focus on the relevance of these technologies for ongoing pandemic preparedness initiatives.

15 days free trial to Access Article
Decision Support for Agricultural Consultants With Semantic Data Federation

International Journal of Agricultural and Environmental Information Systems, 2018

Co-Authors: Mohammad Sadnan Al Manir, Bruce Spencer, Christopher J O Baker

Abstract:

Informational needs of agricultural consultants are increasingly complex. Advising farmers on the appropriate measures for optimizing cropping yields demands access to custom Data archives and analytics tools. In line with the increasing number of archives, the expertise required of consultants goes beyond the capabilities of these non-technical agri-specialists. These end users have diverse ad-hoc query needs and require tools that provide simple access to distributed Data silos and easy ways to integrate relevant information. In this article, the authors report on a pilot deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and computation of agricultural Data. A registry of 9 SADI Web services was deployed to expose Data from a variety of different Data resources in support of a defined set of query needs. The authors demonstrate that the deployment of these services facilitates the ad-hoc creation and execution of mission critical workflows targeting use cases in agricultural operations management. Using HYDRA, a semantic query engine for SADI Web services with a custom built graphical user interface, agricultural consultants can identify optimal crop varieties, and compute profit margins of each variety using a complex cost model.

15 days free trial to Access Article
exploring semantic Data Federation to enable malaria surveillance queries

Medical Informatics Europe, 2018

Co-Authors: Jon Hael Brenas, Christopher J O Baker, Mohammad Sadnan Al Manir, Kate Zinszer, Arash Shabannejad

Abstract:

Malaria is an infectious disease affecting people across tropical countries. In order to devise efficient interventions, surveillance experts need to be able to answer increasingly complex queries integrating information coming from repositories distributed all over the globe. This, in turn, requires extraordinary coding abilities that cannot be expected from non-technical surveillance experts. In this paper, we present a deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and querying of malaria Data. More than 10 services were created to answer an example query requiring Data coming from various sources. Our method assists surveillance experts in formulating their queries and gaining access to the answers they need.

15 days free trial to Access Article
MIE - Exploring Semantic Data Federation to Enable Malaria Surveillance Queries.

Studies in health technology and informatics, 2018

Co-Authors: Jon Hael Brenas, Christopher J O Baker, Mohammad Sadnan Al Manir, Kate Zinszer, Arash Shaban-nejad

Abstract:

Malaria is an infectious disease affecting people across tropical countries. In order to devise efficient interventions, surveillance experts need to be able to answer increasingly complex queries integrating information coming from repositories distributed all over the globe. This, in turn, requires extraordinary coding abilities that cannot be expected from non-technical surveillance experts. In this paper, we present a deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and querying of malaria Data. More than 10 services were created to answer an example query requiring Data coming from various sources. Our method assists surveillance experts in formulating their queries and gaining access to the answers they need.

15 days free trial to Access Article
Startup Pitch: HYDRA - an Engine for Ad Hoc Querying and Data Federation for Bioinformatics and Clinical Intelligence

2012

Co-Authors: Christopher J O Baker

Abstract:

Need for Data Federation and Self-Service Querying. Finding and integrating information from multiple heterogeneous and distributed resources accounts for a large share of time and costs in Bioinformatic and Cheminformatic knowledge management activities. Typically users need to draw information from hundreds of completely autonomous resources, such as online biomedical Databases, nomenclatures, clinical Databases and specialised analytical Web services. State-of-the-art approaches to Data integration - Datawarehousing and workflow scripting - are both costly and limited in scope. In the Data Federation paradigm, query access to multiple heterogeneous distributed resources is the same as querying a single Database. In addition, real world business use cases require querying to be self-service, so that non-technical users - scientists and biotechnologists - can run ad hoc queries without help from programmers. The need for self-service querying is equally acute in the Clinical Intelligence context where Data - typically relational and textual - has to be analysed by clinical trial professionals, health care managers, surveillance practitioners and clinical researchers. SADI Semantic Web services. Our solution leverages the power of the SADI (Semantic Automated Discovery and Integration) framework - a set of conventions that turn simple HTTP-based Web services into Semantic Web services that can be fully automatically discovered, composed and called by client programs. Practically, this means that numerous Databases and algorithms can be queried as a single Database. This is achieved by associating a description with each SADI service that unambiguously defines what the service does, thus facilitating the discovery of the service by client programs when they need the corresponding functionality. The schema of the virtual Database represented by a network of SADI services is, essentially, a controlled vocabulary containing concepts and relations from the subject domain, e.g., biology, chemistry or health care, which can be understood by non-technical users, unlike most of the real-life relational and XML Database schemas. This semantic exposition of the Data represented by networks of SADI services facilitates self-service ad hoc querying. SADI has been developed since 2007 by several bioinformatics laboratories in North America and Europe. It currently comprises high-quality open-source libraries for easy service creation, infrastructure for service discovery and a few client program prototypes. Several case studies have been performed to test the technology in Data integration scenarios in genomics, cheminformatics, lipidomics, toxicology and Clinical Intelligence. 600+ SADI services for Bioinformatics and Cheminformatics have been created. Startup, Technology and Products. IPSNP Computing Inc. based in Saint John, Canada, was set up to commercialize prior university-based research on Data Federation and semantic querying with SADI. The core technology is a high-performance query engine (working title HYDRA) operating on networks of SADI services representing various distributed resources. HYDRA will be packaged and licensed as two products: an intuitive end user-oriented querying and Data browsing tool, including a software-as-a-service edition, and an OEM-oriented Java toolkit. IPSNP will target Bioinformatics and Clinical Intelligence markets and, later, other verticals requiring self-service ad hoc federated querying. []

15 days free trial to Access Article

Bertram Ludascher - One of the best experts on this subject based on the ideXlab platform.

Dataone a Data Federation with provenance support

International Provenance and Annotation Workshop, 2016

Co-Authors: Yang Cao, Christopher Jones, Victor Cuevasvicenttin, Matthew Jones, Bertram Ludascher, Timothy Mcphillips, Paolo Missier, Christopher R Schwalm, Peter Slaughter, Dave Vieglais

Abstract:

DataONE is a federated Data network focusing on earth and environmental science Data. We present the provenance and search features of DataONE by means of an example involving three earth scientists who interact through a DataONE Member Node. DataONE provenance systems enable reproducible research and facilitate proper attribution of scientific results transitively across generations of derived Data products.

15 days free trial to Access Article
IPAW - DataONE: A Data Federation with Provenance Support

Lecture Notes in Computer Science, 2016

Co-Authors: Yang Cao, Christopher Jones, Matthew Jones, Bertram Ludascher, Timothy Mcphillips, Paolo Missier, Christopher R Schwalm, Peter Slaughter, Víctor Cuevas-vicenttín, Dave Vieglais

Abstract:

DataONE is a federated Data network focusing on earth and environmental science Data. We present the provenance and search features of DataONE by means of an example involving three earth scientists who interact through a DataONE Member Node. DataONE provenance systems enable reproducible research and facilitate proper attribution of scientific results transitively across generations of derived Data products.

15 days free trial to Access Article
TaPP - D-PROV: extending the PROV provenance model with workflow structure

2013

Co-Authors: Paolo Missier, Víctor Cuevas-vicenttín, Saumen Dey, Khalid Belhajjame, Bertram Ludascher

Abstract:

This paper presents an extension to the W3C PROV provenance model, aimed at representing process structure. Although the modelling of process structure is out of the scope of the PROV specification, it is beneficial when capturing and analyzing the provenance of Data that is produced by programs or other formally encoded processes. In the paper, we motivate the need for such and extended model in the context of an ongoing large Data Federation and preservation project, DataONE, where provenance traces of scientific workflow runs are captured and stored alongside the Data products. We introduce new provenance relations for modelling process structure along with their usage patterns, and present sample queries that demonstrate their benefit.

15 days free trial to Access Article

Christopher G. Chute - One of the best experts on this subject based on the ideXlab platform.

Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

Journal of Biomedical Semantics, 2012

Co-Authors: Jyotishman Pathak, Richard C. Kiefer, Suzette J Bielinski, Christopher G. Chute

Abstract:

Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype Data stored at the Mayo Clinic Biobank to mine the phenotype Data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure Data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale Data Federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical Data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.

15 days free trial to Access Article
Using semantic web technologies for cohort identification from electronic health records for clinical research.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2012

Co-Authors: Jyotishman Pathak, Richard C. Kiefer, Christopher G. Chute

Abstract:

The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical Data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR Data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale Data Federation approaches to execute complex queries.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Data Federation with ideXlab!

Stuart E Madnick - One of the best experts on this subject based on the ideXlab platform.

General Strategy for Querying Web Sources in a Data Federation Environment

Reconciling Equational Heterogeneity Within a Data Federation

querying web sources within a Data Federation

ICIS - Querying Web-Sources within a Data Federation

Querying Web-Sources within a Data Federation Web-based Information Systems and Applications

Aykut Firat - One of the best experts on this subject based on the ideXlab platform.

General Strategy for Querying Web Sources in a Data Federation Environment

Reconciling Equational Heterogeneity Within a Data Federation

querying web sources within a Data Federation

ICIS - Querying Web-Sources within a Data Federation

Querying Web-Sources within a Data Federation Web-based Information Systems and Applications

Christopher J O Baker - One of the best experts on this subject based on the ideXlab platform.

Applied Ontologies for Global Health Surveillance and Pandemic Intelligence

Decision Support for Agricultural Consultants With Semantic Data Federation

exploring semantic Data Federation to enable malaria surveillance queries

MIE - Exploring Semantic Data Federation to Enable Malaria Surveillance Queries.

Startup Pitch: HYDRA - an Engine for Ad Hoc Querying and Data Federation for Bioinformatics and Clinical Intelligence

Bertram Ludascher - One of the best experts on this subject based on the ideXlab platform.

Dataone a Data Federation with provenance support

IPAW - DataONE: A Data Federation with Provenance Support

TaPP - D-PROV: extending the PROV provenance model with workflow structure

Christopher G. Chute - One of the best experts on this subject based on the ideXlab platform.

Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

Using semantic web technologies for cohort identification from electronic health records for clinical research.