The Experts below are selected from a list of 49377 Experts worldwide ranked by ideXlab platform
Stuart E Madnick - One of the best experts on this subject based on the ideXlab platform.
-
General Strategy for Querying Web Sources in a Data Federation Environment
Theoretical and Practical Advances in Information Systems Development, 2011Co-Authors: Aykut Firat, Stuart E MadnickAbstract:Modern Database management systems are supporting the inclusion and querying of non-relational sources within a Data Federation environment via wrappers. Wrapper development for Web sources, however, is a convolution of code with extraction and query planning knowledge and becomes a daunting task. We use IBM DB2 Federation engine to demonstrate the challenges of incorporating web sources into a Data Federation. We, then, present a practical and general strategy for the inclusion and querying of web sources without requiring any changes in the underlying Data Federation technology. This strategy separates the code and knowledge in wrapper development by introducing a general-purpose capabilities-aware mini query-planner and a Data extraction engine. As a result, Web sources can be included in a Data Federation system faster, and maintained easier.
-
Reconciling Equational Heterogeneity Within a Data Federation
SSRN Electronic Journal, 2009Co-Authors: Aykut Firat, Stuart E Madnick, Michael Siegel, Benjamin N. Grosof, Frank ManolaAbstract:Mappings in most federated Databases are conceptualized and implemented as black-box transformations between source schemas and a federated schema. This approach does not allow specific mappings to be declared once and reused in other situations. We present an alternative approach, in which Data-level mappings are represented independent of source and federated schemas as a network between “contexts”. This compendious representation expedites the Data Federation process via mapping reuse and automated mapping composition from simpler mappings. We illustrate the benefits of mapping reuse and composition by using an example that incorporates equational mappings and the application of symbolic equation solving techniques.
-
querying web sources within a Data Federation
International Conference on Information Systems, 2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
-
ICIS - Querying Web-Sources within a Data Federation
SSRN Electronic Journal, 2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
-
Querying Web-Sources within a Data Federation Web-based Information Systems and Applications
2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems – which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the “request-reply-compensate” protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM’s Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
Aykut Firat - One of the best experts on this subject based on the ideXlab platform.
-
General Strategy for Querying Web Sources in a Data Federation Environment
Theoretical and Practical Advances in Information Systems Development, 2011Co-Authors: Aykut Firat, Stuart E MadnickAbstract:Modern Database management systems are supporting the inclusion and querying of non-relational sources within a Data Federation environment via wrappers. Wrapper development for Web sources, however, is a convolution of code with extraction and query planning knowledge and becomes a daunting task. We use IBM DB2 Federation engine to demonstrate the challenges of incorporating web sources into a Data Federation. We, then, present a practical and general strategy for the inclusion and querying of web sources without requiring any changes in the underlying Data Federation technology. This strategy separates the code and knowledge in wrapper development by introducing a general-purpose capabilities-aware mini query-planner and a Data extraction engine. As a result, Web sources can be included in a Data Federation system faster, and maintained easier.
-
Reconciling Equational Heterogeneity Within a Data Federation
SSRN Electronic Journal, 2009Co-Authors: Aykut Firat, Stuart E Madnick, Michael Siegel, Benjamin N. Grosof, Frank ManolaAbstract:Mappings in most federated Databases are conceptualized and implemented as black-box transformations between source schemas and a federated schema. This approach does not allow specific mappings to be declared once and reused in other situations. We present an alternative approach, in which Data-level mappings are represented independent of source and federated schemas as a network between “contexts”. This compendious representation expedites the Data Federation process via mapping reuse and automated mapping composition from simpler mappings. We illustrate the benefits of mapping reuse and composition by using an example that incorporates equational mappings and the application of symbolic equation solving techniques.
-
querying web sources within a Data Federation
International Conference on Information Systems, 2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
-
ICIS - Querying Web-Sources within a Data Federation
SSRN Electronic Journal, 2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems - which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
-
Querying Web-Sources within a Data Federation Web-based Information Systems and Applications
2006Co-Authors: Aykut Firat, Tarik Alatovic, Stuart E MadnickAbstract:The web is undoubtedly the largest and most diverse repository of Data, but it was not designed to offer the capabilities of traditional Data base management systems – which is unfortunate. In a true Data Federation, all types of Data sources, such as relational Databases and semi-structured websites, could be used together. IBM WebSphere uses the “request-reply-compensate” protocol to communicate with wrappers in a Data Federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to Data Federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM’s Data Federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native Data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing Data Federation technology.
Christopher J O Baker - One of the best experts on this subject based on the ideXlab platform.
-
Applied Ontologies for Global Health Surveillance and Pandemic Intelligence
2020Co-Authors: Christopher J O Baker, Jon Hael Brenas, Kate Zinszer, Mohammad Sadnan Al Manir, Arash Shaban-nejadAbstract:AbstractGlobal health surveillance and pandemic intelligence rely on the systematic collection and integration of Data from diverse distributed and heterogeneous sources at various levels of granularity. These sources include Data from multiple disciplines represented in different formats, languages, and structures posing significant integration challenges This article provides an overview of challenges in Data driven surveillance. Using Malaria surveillance as a use case we highlight the contribution made by emerging semantic Data Federation technologies that offer enhanced interoperability, interpretability and explainability through the adoption of ontologies. The paper concludes with a focus on the relevance of these technologies for ongoing pandemic preparedness initiatives.
-
Decision Support for Agricultural Consultants With Semantic Data Federation
International Journal of Agricultural and Environmental Information Systems, 2018Co-Authors: Mohammad Sadnan Al Manir, Bruce Spencer, Christopher J O BakerAbstract:Informational needs of agricultural consultants are increasingly complex. Advising farmers on the appropriate measures for optimizing cropping yields demands access to custom Data archives and analytics tools. In line with the increasing number of archives, the expertise required of consultants goes beyond the capabilities of these non-technical agri-specialists. These end users have diverse ad-hoc query needs and require tools that provide simple access to distributed Data silos and easy ways to integrate relevant information. In this article, the authors report on a pilot deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and computation of agricultural Data. A registry of 9 SADI Web services was deployed to expose Data from a variety of different Data resources in support of a defined set of query needs. The authors demonstrate that the deployment of these services facilitates the ad-hoc creation and execution of mission critical workflows targeting use cases in agricultural operations management. Using HYDRA, a semantic query engine for SADI Web services with a custom built graphical user interface, agricultural consultants can identify optimal crop varieties, and compute profit margins of each variety using a complex cost model.
-
exploring semantic Data Federation to enable malaria surveillance queries
Medical Informatics Europe, 2018Co-Authors: Jon Hael Brenas, Christopher J O Baker, Mohammad Sadnan Al Manir, Kate Zinszer, Arash ShabannejadAbstract:Malaria is an infectious disease affecting people across tropical countries. In order to devise efficient interventions, surveillance experts need to be able to answer increasingly complex queries integrating information coming from repositories distributed all over the globe. This, in turn, requires extraordinary coding abilities that cannot be expected from non-technical surveillance experts. In this paper, we present a deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and querying of malaria Data. More than 10 services were created to answer an example query requiring Data coming from various sources. Our method assists surveillance experts in formulating their queries and gaining access to the answers they need.
-
MIE - Exploring Semantic Data Federation to Enable Malaria Surveillance Queries.
Studies in health technology and informatics, 2018Co-Authors: Jon Hael Brenas, Christopher J O Baker, Mohammad Sadnan Al Manir, Kate Zinszer, Arash Shaban-nejadAbstract:Malaria is an infectious disease affecting people across tropical countries. In order to devise efficient interventions, surveillance experts need to be able to answer increasingly complex queries integrating information coming from repositories distributed all over the globe. This, in turn, requires extraordinary coding abilities that cannot be expected from non-technical surveillance experts. In this paper, we present a deployment of Semantic Automated Discovery and Integration (SADI) Web services for the Federation and querying of malaria Data. More than 10 services were created to answer an example query requiring Data coming from various sources. Our method assists surveillance experts in formulating their queries and gaining access to the answers they need.
-
Startup Pitch: HYDRA - an Engine for Ad Hoc Querying and Data Federation for Bioinformatics and Clinical Intelligence
2012Co-Authors: Christopher J O BakerAbstract:Need for Data Federation and Self-Service Querying. Finding and integrating information from multiple heterogeneous and distributed resources accounts for a large share of time and costs in Bioinformatic and Cheminformatic knowledge management activities. Typically users need to draw information from hundreds of completely autonomous resources, such as online biomedical Databases, nomenclatures, clinical Databases and specialised analytical Web services. State-of-the-art approaches to Data integration - Datawarehousing and workflow scripting - are both costly and limited in scope. In the Data Federation paradigm, query access to multiple heterogeneous distributed resources is the same as querying a single Database. In addition, real world business use cases require querying to be self-service, so that non-technical users - scientists and biotechnologists - can run ad hoc queries without help from programmers. The need for self-service querying is equally acute in the Clinical Intelligence context where Data - typically relational and textual - has to be analysed by clinical trial professionals, health care managers, surveillance practitioners and clinical researchers. SADI Semantic Web services. Our solution leverages the power of the SADI (Semantic Automated Discovery and Integration) framework - a set of conventions that turn simple HTTP-based Web services into Semantic Web services that can be fully automatically discovered, composed and called by client programs. Practically, this means that numerous Databases and algorithms can be queried as a single Database. This is achieved by associating a description with each SADI service that unambiguously defines what the service does, thus facilitating the discovery of the service by client programs when they need the corresponding functionality. The schema of the virtual Database represented by a network of SADI services is, essentially, a controlled vocabulary containing concepts and relations from the subject domain, e.g., biology, chemistry or health care, which can be understood by non-technical users, unlike most of the real-life relational and XML Database schemas. This semantic exposition of the Data represented by networks of SADI services facilitates self-service ad hoc querying. SADI has been developed since 2007 by several bioinformatics laboratories in North America and Europe. It currently comprises high-quality open-source libraries for easy service creation, infrastructure for service discovery and a few client program prototypes. Several case studies have been performed to test the technology in Data integration scenarios in genomics, cheminformatics, lipidomics, toxicology and Clinical Intelligence. 600+ SADI services for Bioinformatics and Cheminformatics have been created. Startup, Technology and Products. IPSNP Computing Inc. based in Saint John, Canada, was set up to commercialize prior university-based research on Data Federation and semantic querying with SADI. The core technology is a high-performance query engine (working title HYDRA) operating on networks of SADI services representing various distributed resources. HYDRA will be packaged and licensed as two products: an intuitive end user-oriented querying and Data browsing tool, including a software-as-a-service edition, and an OEM-oriented Java toolkit. IPSNP will target Bioinformatics and Clinical Intelligence markets and, later, other verticals requiring self-service ad hoc federated querying. []
Bertram Ludascher - One of the best experts on this subject based on the ideXlab platform.
-
Dataone a Data Federation with provenance support
International Provenance and Annotation Workshop, 2016Co-Authors: Yang Cao, Christopher Jones, Victor Cuevasvicenttin, Matthew Jones, Bertram Ludascher, Timothy Mcphillips, Paolo Missier, Christopher R Schwalm, Peter Slaughter, Dave VieglaisAbstract:DataONE is a federated Data network focusing on earth and environmental science Data. We present the provenance and search features of DataONE by means of an example involving three earth scientists who interact through a DataONE Member Node. DataONE provenance systems enable reproducible research and facilitate proper attribution of scientific results transitively across generations of derived Data products.
-
IPAW - DataONE: A Data Federation with Provenance Support
Lecture Notes in Computer Science, 2016Co-Authors: Yang Cao, Christopher Jones, Matthew Jones, Bertram Ludascher, Timothy Mcphillips, Paolo Missier, Christopher R Schwalm, Peter Slaughter, Víctor Cuevas-vicenttín, Dave VieglaisAbstract:DataONE is a federated Data network focusing on earth and environmental science Data. We present the provenance and search features of DataONE by means of an example involving three earth scientists who interact through a DataONE Member Node. DataONE provenance systems enable reproducible research and facilitate proper attribution of scientific results transitively across generations of derived Data products.
-
TaPP - D-PROV: extending the PROV provenance model with workflow structure
2013Co-Authors: Paolo Missier, Víctor Cuevas-vicenttín, Saumen Dey, Khalid Belhajjame, Bertram LudascherAbstract:This paper presents an extension to the W3C PROV provenance model, aimed at representing process structure. Although the modelling of process structure is out of the scope of the PROV specification, it is beneficial when capturing and analyzing the provenance of Data that is produced by programs or other formally encoded processes. In the paper, we motivate the need for such and extended model in the context of an ongoing large Data Federation and preservation project, DataONE, where provenance traces of scientific workflow runs are captured and stored alongside the Data products. We introduce new provenance relations for modelling process structure along with their usage patterns, and present sample queries that demonstrate their benefit.
Christopher G. Chute - One of the best experts on this subject based on the ideXlab platform.
-
Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank
Journal of Biomedical Semantics, 2012Co-Authors: Jyotishman Pathak, Richard C. Kiefer, Suzette J Bielinski, Christopher G. ChuteAbstract:Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype Data stored at the Mayo Clinic Biobank to mine the phenotype Data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure Data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale Data Federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical Data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
-
Using semantic web technologies for cohort identification from electronic health records for clinical research.
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2012Co-Authors: Jyotishman Pathak, Richard C. Kiefer, Christopher G. ChuteAbstract:The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical Data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR Data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale Data Federation approaches to execute complex queries.