Data Lifecycle

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 29307 Experts worldwide ranked by ideXlab platform

Yuri Demchenko - One of the best experts on this subject based on the ideXlab platform.

  • EDISON Data Science Framework (EDSF): Addressing Demand for Data Science and Analytics Competences for the Data Driven Digital Economy
    2021 IEEE Global Engineering Education Conference (EDUCON), 2021
    Co-Authors: Yuri Demchenko, Steve Brewer, Cuadrado Gallego Juan José, Tomasz Wiktorski
    Abstract:

    Emerging Data driven economy including industry, research and business, requires new types of specialists that are capable to support all stages of the Data Lifecycle from Data production and input to Data processing and actionable results delivery, visualisation and reporting, which can be jointly defined as the Data Science professions family. Data Science is becoming a new recognised field of science that leverages the Data Analytics methods with the power of the Big Data technologies and Cloud Computing that both provide a basis for effective use of the Data driven research and economy models. Data Science research and education require a multi-disciplinary approach and Data driven/centric paradigm shift. Besides core professional competences and knowledge in Data Science, increasing digitalisation of Science and Industry also requires new type of workplace and professional skills that rise the importance of critical thinking, problem solving and creativity required to work in highly automated and dynamic environment. The education and training of the Data related professions must reflect all multi-disciplinary knowledge and competences that are required from the Data Science and handling practitioners in modern, Data driven research and the digital economy. In modern conditions with the fast technology change and strong skills demand, the Data Science education and training should be customizable and delivered in multiple forms, also providing sufficient lab facilities for practical training. This paper discusses aspects of building customizable and interoperable Data Science curricula for different types of learners and target application domains. The proposed approach is based on using the EDISON Data Science Framework (EDSF) initially developed in the EU funded Project EDISON and currently being maintained by the EDISON Community Initiative.

  • EDISON Data Science Framework (EDSF) Extension to Address Transversal Skills Required by Emerging Industry 4.0 Transformation
    2019 15th International Conference on eScience (eScience), 2019
    Co-Authors: Yuri Demchenko, Tomasz Wiktorski, Juan Cuadrado Gallego, Steve Brewer
    Abstract:

    The emerging Data-driven economy (also defined as Industry 4.0 or simply 4IR), encompassing industry, research and business, requires new types of specialists that are able to support all stages of the Data Lifecycle from Data production and input, to Data processing and actionable results delivery, visualisation and reporting, which can be collectively defined as the Data Science family of professions. Data Science as a research and academic discipline provides a basis for Data Analytics and ML/AI applications. The education and training of the Data related professions must reflect all multi-disciplinary knowledge and competences that are required from the Data Science and handling practitioners in modern, Data-driven research and the digital economy. In the modern era, with ever faster technology changes, matched by strong skills demand, the Data Science education and training programme should be customizable and deliverable in multiple forms, tailored for different categories of professional roles and profiles. Referring to other publications by the authors on building customizable and interoperable Data Science curricula for different types of learners and target application domains, this paper is focused on defining a set of transversal competences and skills that are required from modern and future Data Science professions. These include workplace and professional skills that cover critical thinking, problem solving, and creativity required to work in highly automated and dynamic environment. The proposed approach is based on the EDISON Data Science Framework (EDSF) initially developed within the EU funded Project EDISON and currently being further developed in the EU funded MATES project and also the FAIRsFAIR projects.

  • Customisable Data Science Educational Environment: From Competences Management and Curriculum Design to Virtual Labs On-Demand
    2017
    Co-Authors: Yuri Demchenko, Cees De Laat, Adam Belloum, Tomasz Wiktorski, Charles Loomis, Erwin Spekschoor
    Abstract:

    Data Science is an emerging field of science, which requires a multi-disciplinary approach and is based on the Big Data and Data intensive technologies that both provide a basis for effective use of the Data driven research and economy models. Modern Data driven research and industry require new types of specialists that are capable to support all stages of the Data Lifecycle from Data production and input to Data processing and actionable results delivery, visualisation and reporting, which can be jointly defined as the Data Science professions family. The education and training of Data Scientists currently lacks a commonly accepted, harmonized instructional model that reflects all multi-disciplinary knowledge and competences that are required from the Data Science practitioners in modern, Data driven research and the digital economy. The educational model and approach should also solve different aspects of the future professionals that includes both theoretical knowledge and practical skills that must be supported by corresponding education infrastructure and educational labs environment. In modern conditions with the fast technology change and strong skills demand, the Data Science education and training should be customizable and delivered in multiple form, also providing sufficient Data labs facilities for practical training. This paper discussed both aspects: building customizable Data Science curriculum for different types of learners and proposing a hybrid model for virtual labs that can combine local university facility and use cloud based Big Data and Data analytics facilities and services on demand. The proposed approach is based on using the EDISON Data Science Framework (EDSF) developed in the EU funded Project EDISON and CYCLONE cloud automation systems being developed in another EU funded project CYCLON

  • Customisable Data Science Educational Environment: From Competences Management and Curriculum Design to Virtual Labs On-Demand
    2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2017
    Co-Authors: Yuri Demchenko, Cees De Laat, Adam Belloum, Tomasz Wiktorski, Charles Loomis, Erwin Spekschoor
    Abstract:

    Data Science is an emerging field of science, which requires a multi-disciplinary approach and is based on the Big Data and Data intensive technologies that both provide a basis for effective use of the Data driven research and economy models. Modern Data driven research and industry require new types of specialists that are capable to support all stages of the Data Lifecycle from Data production and input to Data processing and actionable results delivery, visualisation and reporting, which can be jointly defined as the Data Science professions family. The education and training of Data Scientists currently lacks a commonly accepted, harmonized instructional model that reflects all multi-disciplinary knowledge and competences that are required from the Data Science practitioners in modern, Data driven research and the digital economy. The educational model and approach should also solve different aspects of the future professionals that includes both theoretical knowledge and practical skills that must be supported by corresponding education infrastructure and educational labs environment. In modern conditions with the fast technology change and strong skills demand, the Data Science education and training should be customizable and delivered in multiple form, also providing sufficient Data labs facilities for practical training. This paper discussed both aspects: building customizable Data Science curriculum for different types of learners and proposing a hybrid model for virtual labs that can combine local university facility and use cloud based Big Data and Data analytics facilities and services on demand. The proposed approach is based on using the EDISON Data Science Framework (EDSF) developed in the EU funded Project EDISON and CYCLONE cloud automation systems being developed in another EU funded project CYCLONE.

  • Architecture Framework and Components for the Big Data Ecosystem
    Journal of System and Network Engineering, 2013
    Co-Authors: Yuri Demchenko, Canh Ngo, Peter Membrey
    Abstract:

    Big Data are becoming a new technology focus both in science and in industry and motivate technology shift to Data centric architecture and operational models. There is a vital need to define the basic information/semantic models, architecture components and operational models that together comprise a so-called Big Data Ecosystem. This paper discusses a nature of Big Data that may originate from different scientific, industry and social activity domains and proposes improved Big Data definition that includes the following parts: Big Data properties ( also called Big Data 5V: Volume, Velocity, Variety, Value and Veracity), Data models and structures, Data analytics, infrastructure and security. The paper discusses paradigm change from traditional host or service based to Data centric architecture and operational models in Big Data. The Big Data Architecture Framework (BDAF) is proposed to address all aspects of the Big Data Ecosystem and includes the following components: Big Data Infrastructure, Big Data Analytics, Data structures and models, Big Data Lifecycle Management, Big Data Security. The paper analyses requirements to and provides suggestions how the mentioned above components can address the main Big Data challenges. The presented work intends to provide a consolidated view of the Big Data phenomena and related challenges to modern technologies, and initiate wide discussion.

Mark M. Mcgilchrist - One of the best experts on this subject based on the ideXlab platform.

  • possible sources of bias in primary care electronic health record Data use and reuse
    Journal of Medical Internet Research, 2018
    Co-Authors: Robert A. Verheij, Vasa Curcin, Brendan Delaney, Mark M. Mcgilchrist
    Abstract:

    Background: Enormous amounts of Data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these Data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for Data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. Objective: In this paper, we discuss whether Data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the Data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of Data. Methods: This paper is based on the authors’ experience as users of electronic health records Data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. Results: We first describe the different stages in the processing of electronic health record Data, as well as the different purposes for which these Data are used. Given the different Data processing steps and purposes, we then discuss the possible mechanisms for each individual Data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. Conclusions: There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses Data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated Data collection process and environmental influences that can affect the quality of the Data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless Data quality issues are better understood and unless adequate controls are embedded throughout the Data Lifecycle, Data-driven health care will not live up to its expectations. We need a Data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda. [J Med Internet Res 2018;20(5):e185]

  • Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse
    Journal of medical Internet research, 2018
    Co-Authors: Robert A. Verheij, Brendan C. Delaney, Vasa Curcin, Mark M. Mcgilchrist
    Abstract:

    BACKGROUND Enormous amounts of Data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these Data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for Data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. OBJECTIVE In this paper, we discuss whether Data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the Data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of Data. METHODS This paper is based on the authors' experience as users of electronic health records Data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. RESULTS We first describe the different stages in the processing of electronic health record Data, as well as the different purposes for which these Data are used. Given the different Data processing steps and purposes, we then discuss the possible mechanisms for each individual Data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. CONCLUSIONS There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses Data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated Data collection process and environmental influences that can affect the quality of the Data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless Data quality issues are better understood and unless adequate controls are embedded throughout the Data Lifecycle, Data-driven health care will not live up to its expectations. We need a Data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda.

Robert A. Verheij - One of the best experts on this subject based on the ideXlab platform.

  • possible sources of bias in primary care electronic health record Data use and reuse
    Journal of Medical Internet Research, 2018
    Co-Authors: Robert A. Verheij, Vasa Curcin, Brendan Delaney, Mark M. Mcgilchrist
    Abstract:

    Background: Enormous amounts of Data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these Data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for Data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. Objective: In this paper, we discuss whether Data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the Data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of Data. Methods: This paper is based on the authors’ experience as users of electronic health records Data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. Results: We first describe the different stages in the processing of electronic health record Data, as well as the different purposes for which these Data are used. Given the different Data processing steps and purposes, we then discuss the possible mechanisms for each individual Data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. Conclusions: There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses Data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated Data collection process and environmental influences that can affect the quality of the Data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless Data quality issues are better understood and unless adequate controls are embedded throughout the Data Lifecycle, Data-driven health care will not live up to its expectations. We need a Data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda. [J Med Internet Res 2018;20(5):e185]

  • Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse
    Journal of medical Internet research, 2018
    Co-Authors: Robert A. Verheij, Brendan C. Delaney, Vasa Curcin, Mark M. Mcgilchrist
    Abstract:

    BACKGROUND Enormous amounts of Data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these Data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for Data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. OBJECTIVE In this paper, we discuss whether Data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the Data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of Data. METHODS This paper is based on the authors' experience as users of electronic health records Data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. RESULTS We first describe the different stages in the processing of electronic health record Data, as well as the different purposes for which these Data are used. Given the different Data processing steps and purposes, we then discuss the possible mechanisms for each individual Data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. CONCLUSIONS There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses Data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated Data collection process and environmental influences that can affect the quality of the Data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless Data quality issues are better understood and unless adequate controls are embedded throughout the Data Lifecycle, Data-driven health care will not live up to its expectations. We need a Data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda.

Yong Wang - One of the best experts on this subject based on the ideXlab platform.

  • Develop Ten Security Analytics Metrics for Big Data on the Cloud
    Advances in Data Sciences Security and Applications, 2020
    Co-Authors: Yong Wang, Bharat S Rawal, Qiang Duan
    Abstract:

    This paper reviews big Data security analytics and looks into the big Data Lifecycle for assessments of big Data security challenge in clouds. The paper justifies the reasons to develop big Data security metrics. In this paper, we propose ten big Data security metrics securing big Data on cloud. This paper contributes new knowledge to big Data security by tying with right security metrics.

  • big Data Lifecycle threats and security model
    Americas Conference on Information Systems, 2015
    Co-Authors: Yazan Alshboul, Raj Kumar Nepali, Yong Wang
    Abstract:

    Abstract Big Data is an emerging term referring to the process of managing huge amount of Data from different sources, such as, DBMS, log files, postings of social media. Big Data (text, number, images... etc.) could be divided into different forms: structured, semi-structured, and unstructured. Big Data could be further described by some attributes like velocity, volume, variety, value, and complexity. The emerging big Data technologies also raise many security concerns and challenges. In this paper, we present big Data Lifecycle framework. The Lifecycle includes four phases, i.e., Data collection, Data storage, Data analytics, and knowledge creation. We briefly introduce each phase. We further summarize the security threats and attacks for each phase. The big Data Lifecycle integrated with security threats and attacks to propose a security threat model to conduct research in big Data security. Our work could be further used towards securing big Data infrastructure. Keywords Big Data, big Data Lifecycle, threats and attacks, threat model.

Line C Pouchard - One of the best experts on this subject based on the ideXlab platform.

  • revisiting the Data Lifecycle with big Data curation
    International Journal of Digital Curation, 2016
    Co-Authors: Line C Pouchard
    Abstract:

    As science becomes more Data-intensive and collaborative, researchers increasingly use larger and more complex Data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research Data repositories have been built to host research Data in response to the requirements of sponsors that research Data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the Data produced in the course of publicly funded research. As librarians and Data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on Data management and curation. We use the methodological framework of the Data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the Data and assuring quality being an integral part of each activity. We discuss the relationship between institutional Data curation repositories and new long-term Data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project