In-Memory Processing

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3608859 Experts worldwide ranked by ideXlab platform

Liqiang Wang - One of the best experts on this subject based on the ideXlab platform.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Chen Xu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

Zhibo Sun - One of the best experts on this subject based on the ideXlab platform.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Chen Xu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

Csaba Leranth - One of the best experts on this subject based on the ideXlab platform.

  • nucleus reuniens of the midline thalamus link between the medial prefrontal cortex and the hippocampus
    Brain Research Bulletin, 2007
    Co-Authors: Robert P Vertes, Walter B Hoover, Klara Szigetibuck, Csaba Leranth
    Abstract:

    The medial prefrontal cortex and the hippocampus serve well recognized roles in memory Processing. The hippocampus projects densely to, and exerts strong excitatory actions on, the medial prefrontal cortex. Interestingly, the medial prefrontal cortex, in rats and other species, has no direct return projections to the hippocampus, and few projections to parahippocampal structures including the entorhinal cortex. It is well established that the nucleus reuniens of the midline thalamus is the major source of thalamic afferents to the hippocampus. Since the medial prefrontal cortex also distributes to nucleus reuniens, we examined medial prefrontal connections with populations of nucleus reuniens neurons projecting to hippocampus. We used a combined anterograde and retrograde tracing procedure at the light and electron microscopic levels. Specifically, we made Phaseolus vulgaris-leuccoagglutinin (PHA-L) injections into the medial prefrontal cortex and Fluorogold injections into the hippocampus (CA1/subiculum) and examined termination patterns of anterogradely PHA-L labeled fibers on retrogradely FG labeled cells of nucleus reuniens. At the light microscopic level, we showed that fibers from the medial prefrontal cortex form multiple putative synaptic contacts with dendrites of hippocampally projecting neurons throughout the extent of nucleus reuniens. At ultrastructural level, we showed that medial prefrontal cortical fibers form asymmetric contacts predominantly with dendritic shafts of hippocampally projecting reuniens cells. These findings indicate that nucleus reuniens represents a critical link between the medial prefrontal cortex and the hippocampus. We discuss the possibility that nucleus reuniens gates the flow of information between the medial prefrontal cortex and hippocampus dependent upon attentive/arousal states of the organism.

Uwe Rohm - One of the best experts on this subject based on the ideXlab platform.

  • technical report on the usability of hadoop mapreduce apache spark apache flink for data science
    arXiv: Distributed Parallel and Cluster Computing, 2018
    Co-Authors: Bilal Akil, Ying Zhou, Uwe Rohm
    Abstract:

    Distributed data Processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improved In-Memory Processing, but in particular provide built-in high-level data Processing functionality, such as filtering and join operators, which should make data analysis tasks easier to develop than with plain Hadoop MapReduce. But is this indeed the case? This paper compares three prominent distributed data Processing platforms: Apache Hadoop MapReduce; Apache Spark; and Apache Flink, from a usability perspective. We report on the design, execution and results of a usability study with a cohort of masters students, who were learning and working with all three platforms in order to solve different use cases set in a data science context. Our findings show that Spark and Flink are preferred platforms over MapReduce. Among participants, there was no significant difference in perceived preference or development time between both Spark and Flink as platforms for batch-oriented big data analysis. This study starts an exploration of the factors that make big data platforms more - or less - effective for users in data science.

  • on the usability of hadoop mapreduce apache spark apache flink for data science
    International Conference on Big Data, 2017
    Co-Authors: Bilal Akil, Ying Zhou, Uwe Rohm
    Abstract:

    Distributed data Processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of more advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improved In-Memory Processing, but in particular provide built-in high-level data Processing functionality, such as filtering and join operators, which should make data analysis tasks easier to develop than with plain Hadoop MapReduce. But is this indeed the case? This paper compares three prominent distributed data Processing platforms: Apache Hadoop MapReduce; Apache Spark; and Apache Flink, from a usability perspective. We report on the design, execution and results of a usability study with a cohort of master students, who were learning and working with all three platforms in order to solve different use cases set in a data science context. Our findings show that Spark and Flink are preferred platforms over MapReduce. Among participants, there was no significant difference in perceived preference or development time between both Spark and Flink as platforms for batch-oriented big data analysis. This study starts an exploration of the factors that make Big Data platforms more — or less — effective for users in data science.

Zixia Liu - One of the best experts on this subject based on the ideXlab platform.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

  • migrating gis big data computing from hadoop to spark an exemplary study using twitter
    International Conference on Cloud Computing, 2016
    Co-Authors: Zhibo Sun, Hong Zhang, Zixia Liu, Chen Xu, Liqiang Wang
    Abstract:

    Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for Processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its In-Memory Processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.