Data Ingestion

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 30570 Experts worldwide ranked by ideXlab platform

Jun Tao - One of the best experts on this subject based on the ideXlab platform.

  • DASFAA Workshops - SVIS: Large Scale Video Data Ingestion into Big Data Platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

  • svis large scale video Data Ingestion into big Data platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

Xiaoyan Guo - One of the best experts on this subject based on the ideXlab platform.

  • DASFAA Workshops - SVIS: Large Scale Video Data Ingestion into Big Data Platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

  • svis large scale video Data Ingestion into big Data platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

Yu Cao - One of the best experts on this subject based on the ideXlab platform.

  • DASFAA Workshops - SVIS: Large Scale Video Data Ingestion into Big Data Platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

  • svis large scale video Data Ingestion into big Data platform
    Database Systems for Advanced Applications, 2015
    Co-Authors: Xiaoyan Guo, Yu Cao, Jun Tao
    Abstract:

    Utilizing big Data processing platform to analyze and extract insights from unstructured video streams becomes emerging trend in video surveillance area. As the first step, how to efficiently ingest video sources into big Data platform is most demanding but challenging problem. However, existing Data loading or ingesting tools either lack of video Ingestion capability or cannot handle such huge volume of video Data. In this paper, we present SVIS, a highly scalable and extendable video Data Ingestion system which can fast ingest different kinds of video source into centralized big Data stores. SVIS embeds rich video content processing functionalities, e.g. video transcoding and object detection. As a result, the ingested Data will have desired formats (i.e. structured Data, well-encoded video sequence files) and hence can be analyzed directly. With a highly scalable architecture and an intelligent schedule engine, SVIS can be dynamically scaled out to handle large scale online camera streams and intensive Ingestion jobs. SVIS is also highly extendable. It defines various interfaces to enable embedding user-defined modules to support new types of video source and Data sink. Experimental results show that SVIS system has high efficiency and good scalability.

Imad Zaza - One of the best experts on this subject based on the ideXlab platform.

  • Smart City Architecture for Data Ingestion and Analytics: Processes and Solutions
    2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService), 2018
    Co-Authors: Pierfrancesco Bellini, Paolo Nesi, Michela Paolucci, Imad Zaza
    Abstract:

    Smart city architectures have to take into account a large number of requirements related to the large number of Data, different sources, the need of reconciliating them in a unique model, the identification of relationships, and the enabling of Data analytics processes. Ingested Data, static and realtime, must be stored, aggregated and integrated to provide support for Data analytics, dashboard, making decision, and thus for providing services for the city. This means: i) compatibility with multiple protocols; ii) handle open and private Data; iii) work with IOT/sensors/internet of everything; iv) perform predictions, behavior analysis and develop decision support systems; v) use a set of dashboards to make a real-time monitoring of the city; vi) consider system's security aspects: robustness, scalability, modularity, interoperability, etc. This approach is determinant to: monitor the city status; connect the different events that occur in the smart city; provide support for public administrators, police department, civil protection, hospitals, etc., to put in action city/region strategies and guidelines and obviously directly to the citizens. In the paper, we focus on Data Ingestion and aggregation aspects, putting in evidence problems and solutions. The solution proposed has been developed and applied in the context of the Sii-Mobility national smart city project on mobility and transport integrated with services. Sii-Mobility is grounded on Km4City ontology and tools for smart city Data aggregation and service production.

  • BigDataService - Smart City Architecture for Data Ingestion and Analytics: Processes and Solutions
    2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService), 2018
    Co-Authors: Pierfrancesco Bellini, Paolo Nesi, Michela Paolucci, Imad Zaza
    Abstract:

    Smart city architectures have to take into account a large number of requirements related to the large number of Data, different sources, the need of reconciliating them in a unique model, the identification of relationships, and the enabling of Data analytics processes. Ingested Data, static and realtime, must be stored, aggregated and integrated to provide support for Data analytics, dashboard, making decision, and thus for providing services for the city. This means: i) compatibility with multiple protocols; ii) handle open and private Data; iii) work with IOT/sensors/internet of everything; iv) perform predictions, behavior analysis and develop decision support systems; v) use a set of dashboards to make a real-time monitoring of the city; vi) consider system's security aspects: robustness, scalability, modularity, interoperability, etc. This approach is determinant to: monitor the city status; connect the different events that occur in the smart city; provide support for public administrators, police department, civil protection, hospitals, etc., to put in action city/region strategies and guidelines and obviously directly to the citizens. In the paper, we focus on Data Ingestion and aggregation aspects, putting in evidence problems and solutions. The solution proposed has been developed and applied in the context of the Sii-Mobility national smart city project on mobility and transport integrated with services. Sii-Mobility is grounded on Km4City ontology and tools for smart city Data aggregation and service production.

Michael J Carey - One of the best experts on this subject based on the ideXlab platform.

  • An LSM-based Tuple Compaction Framework for Apache AsterixDB (Extended Version)
    arXiv: Databases, 2019
    Co-Authors: Wail Y. Alkowaileet, Sattam Alsubaiee, Michael J Carey
    Abstract:

    Document Database systems store self-describing semi-structured records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not free. The large amount of redundancy in the records can introduce an unnecessary storage overhead and impact query performance. Our focus in this paper is to address the storage overhead issue by introducing a tuple compactor framework that infers and extracts the schema from self-describing semi-structured records during the Data Ingestion. As many prominent document stores, such as MongoDB and Couchbase, adopt Log Structured Merge (LSM) trees in their storage engines, our framework exploits LSM lifecycle events to piggyback the schema inference and extraction operations. We have implemented and empirically evaluated our approach to measure its impact on storage, Data Ingestion, and query performance in the context of Apache AsterixDB.

  • An LSM-based Tuple Compaction Framework for Apache AsterixDB
    arXiv: Databases, 2019
    Co-Authors: Wail Y. Alkowaileet, Sattam Alsubaiee, Michael J Carey
    Abstract:

    Document Database systems store self-describing records, such as JSON, "as-is" without requiring the users to pre-define a schema. This provides users with the flexibility to change the structure of incoming records without worrying about taking the system offline or hindering the performance of currently running queries. However, the flexibility of such systems does not come without a cost. The large amount of redundancy in the stored records can introduce an unnecessary storage overhead and impact query performance. Our focus in this paper is to address the storage overhead issue by introducing a tuple compactor framework that infers and extracts the schema from self-describing records during the Data Ingestion process. As many prominent document store systems, such as MongoDB and Couchbase, adopt Log Structured Merge (LSM) trees in their storage engines, our framework exploits LSM lifecycle events to piggyback the schema inference and extraction operations. We have implemented and empirically evaluated our approach to measure its impact on storage, Data Ingestion, and query performance in the context of Apache AsterixDB.

  • An IDEA: an i ngestion framework for d ata e nrichment in a sterixDB
    Proceedings of the VLDB Endowment, 2019
    Co-Authors: Xikui Wang, Michael J Carey
    Abstract:

    Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the Ingestion pipeline so that they can be stored (and queried) together with the Data. In some cases, the referenced information may change over time, so the Ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results. In this paper, we present a new Data Ingestion framework that supports Data Ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference Data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.

  • efficient Data Ingestion and query processing for lsm based storage systems
    arXiv: Databases, 2018
    Co-Authors: Michael J Carey
    Abstract:

    In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based key-value stores with only a primary LSM-tree index; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focus on efficient Data Ingestion and query processing for general-purpose LSM-based storage systems. We first propose and evaluate a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes. We then present several new and efficient maintenance strategies for LSM-based storage systems. Finally, we have implemented and experimentally evaluated the proposed techniques in the context of the Apache AsterixDB system, and we present the results here.

  • Data Ingestion in asterixdb
    Extending Database Technology, 2015
    Co-Authors: Raman Grover, Michael J Carey
    Abstract:

    In this paper we describe the support for Data Ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured Data. Data feeds are a new mechanism for having continuous Data arrive into a BDMS from external sources and incrementally populate a persisted Dataset and associated indexes. We add a new BDMS architectural component, called a Data feed, that makes a Big Data system the caretaker for functionality that used to live outside, and we show how it improves users’ lives and system performance. We show how to build the Data feed component, architecturally, and how an enhanced user model can enable sharing of ingested Data. We describe how to make this component fault-tolerant so the system manages input in the presence of failures. We also show how to make this component elastic so that variances in incoming Data rates can be handled gracefully without Data loss if/when desired. Results from initial experiments that evaluate scalability and fault-tolerance of AsterixDB Data feeds facility are reported. We include an evaluation of built-in Ingestion policies and study their effect as well on throughput and latency. An evaluation and comparison with a ‘glued’ together system formed from popular engines — Storm (for streaming) and MongoDB (for persistence) — is also included.