Tiered Storage

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1047 Experts worldwide ranked by ideXlab platform

Yi Yao - One of the best experts on this subject based on the ideXlab platform.

  • live data migration for reducing sla violations in multi Tiered Storage systems
    IEEE International Conference on Cloud Engineering, 2014
    Co-Authors: Jianzhe Tai, Bo Sheng, Yi Yao
    Abstract:

    Today, the volume of data in the world has been tremendously increased. Large-scaled and diverse data sets are raising new big challenges of Storage, process, and query. Tiered Storage architectures combining solid-state drives (SSDs) with hard disk drives (HDDs), become attractive in enterprise data centers for achieving high performance and large capacity simultaneously. However, how to best use these Storage resources and efficiently manage massive data for providing high quality of service (QoS) is still a core and difficult problem. In this paper, we present a new approach for automated data movement in multi-Tiered Storage systems, which lively migrates the data across different tiers, aiming to support multiple service level agreements (SLAs) for applications with dynamic workloads at the minimal cost. Trace-driven simulations show that compared to the no migration policy, LMsT significantly improves average I/O response times, I/O violation ratios and I/O violation times, with only slight degradation (e.g., up to 6% increase in SLA violation ratio) on the performance of high priority applications.

  • IC2E - Live Data Migration for Reducing SLA Violations in Multi-Tiered Storage Systems
    2014 IEEE International Conference on Cloud Engineering, 2014
    Co-Authors: Jianzhe Tai, Bo Sheng, Yi Yao
    Abstract:

    Today, the volume of data in the world has been tremendously increased. Large-scaled and diverse data sets are raising new big challenges of Storage, process, and query. Tiered Storage architectures combining solid-state drives (SSDs) with hard disk drives (HDDs), become attractive in enterprise data centers for achieving high performance and large capacity simultaneously. However, how to best use these Storage resources and efficiently manage massive data for providing high quality of service (QoS) is still a core and difficult problem. In this paper, we present a new approach for automated data movement in multi-Tiered Storage systems, which lively migrates the data across different tiers, aiming to support multiple service level agreements (SLAs) for applications with dynamic workloads at the minimal cost. Trace-driven simulations show that compared to the no migration policy, LMsT significantly improves average I/O response times, I/O violation ratios and I/O violation times, with only slight degradation (e.g., up to 6% increase in SLA violation ratio) on the performance of high priority applications.

Lavanya Ramakrishnan - One of the best experts on this subject based on the ideXlab platform.

  • IPDPS - Data Jockey: Automatic Data Management for HPC Multi-Tiered Storage Systems
    2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019
    Co-Authors: Woong Shin, Devarshi Ghoshal, Christopher Brumgard, Bing Xie, Sudharshan S. Vazhkudai, Sarp Oral, Lavanya Ramakrishnan
    Abstract:

    We present the design and implementation of Data Jockey, a data management system for HPC multi-Tiered Storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and placement for scientific workflows and integrates into existing HPC Storage infrastructures. Data Jockey simplifies data management by eliminating human effort in programming complex data movements, laying datasets across multiple Storage tiers when supporting complex workflows, which in turn increases the usability of multi-Tiered Storage systems emerging in modern HPC data centers. Specifically, Data Jockey presents a new data management scheme called "goal driven data management" that can automatically infer low-level bulk data movement plans from declarative high-level goal statements that come from the lifetime of iterative runs of scientific workflows. While doing so, Data Jockey aims to minimize data wait times by taking responsibility for datasets that are unused or to be used, and aggressively utilizing the capacity of the upper, higher performant Storage tiers. We evaluated a prototype implementation of Data Jockey under a synthetic workload based on a year's worth of Oak Ridge Leadership Computing Facility's (OLCF) operational logs. Our evaluations suggest that Data Jockey leads to higher utilization of the upper Storage tiers while minimizing the programming effort of data movement compared to human involved, per-domain ad-hoc data management scripts.

  • madats managing data on Tiered Storage for scientific workflows
    High Performance Distributed Computing, 2017
    Co-Authors: Devarshi Ghoshal, Lavanya Ramakrishnan
    Abstract:

    Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and Storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the Tiered Storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying Storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.

  • HPDC - MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
    Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017
    Co-Authors: Devarshi Ghoshal, Lavanya Ramakrishnan
    Abstract:

    Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and Storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the Tiered Storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying Storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.

Ling Liu - One of the best experts on this subject based on the ideXlab platform.

  • adaptive data migration in multi Tiered Storage based cloud environment
    International Conference on Cloud Computing, 2010
    Co-Authors: Gong Zhang, Lawrence Chiu, Ling Liu
    Abstract:

    Multi-Tiered Storage systems today are integrating Solid State Disks (SSD) on top of traditional rotational hard disks for performance enhancement due to the significant IO improvements in SSD technology. It is widely recognized that automated data migration between SSD and HDD plays a critical role in effective integration of SSD into multi-Tiered Storage systems. Furthermore, effective data migration has to take into account of application specific workload characteristics, deadlines, and IO profiles. An important and interesting challenge for automated data migration in multi-Tiered Storage systems is how to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-Tiered Storage system. In this paper, we present an adaptive look ahead data migration model that can incorporate application specific characteristics and I/O profiles as well as workload deadlines. Our adaptive data migration model has three unique features. First, it incorporates a set of key factors that may impact on the performance of look ahead migration efficiency in our formal model develop. Second, our data migration model can adaptively determine the optimal look ahead window size, based on several parameters, to optimize the effectiveness of look ahead migration. Third, we formally and experimentally show that the adaptive data migration model can improve overall system performance and resource utilization while meeting workload deadlines. Through our trace driven experimental study, we compare the adaptive look ahead migration approach with the basic migration model and show that the adaptive migration model is effective and efficient for continuously improving and tuning of the performance and scalability of multi-tier Storage systems.

  • automated lookahead data migration in ssd enabled multi Tiered Storage systems
    IEEE Conference on Mass Storage Systems and Technologies, 2010
    Co-Authors: Gong Zhang, Lawrence Chiu, Ling Liu, Clem Dickey, Paul H Muench, Sangeetha Seshadri
    Abstract:

    The significant IO improvements of Solid State Disks (SSD) over traditional rotational hard disks makes it an attractive approach to integrate SSDs in Tiered Storage systems for performance enhancement. However, to integrate SSD into multi-Tiered Storage system effectively, automated data migration between SSD and HDD plays a critical role. In many real world application scenarios like banking and supermarket environments, workload and IO profile present interesting characteristics and also bear the constraint of workload deadline. How to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-Tiered Storage system. In this paper, we present an automated, deadline-aware, lookahead migration scheme to address the data migration challenge. We analyze the factors that may impact on the performance of lookahead migration efficiency and develop a greedy algorithm to adaptively determine the optimal lookahead window size to optimize the effectiveness of lookahead migration, aiming at improving overall system performance and resource utilization while meeting workload deadlines. We compare our lookahead migration approach with the basic migration model and validate the effectiveness and efficiency of our adaptive lookahead migration approach through a trace driven experimental study.

  • IEEE CLOUD - Adaptive Data Migration in Multi-Tiered Storage Based Cloud Environment
    2010 IEEE 3rd International Conference on Cloud Computing, 2010
    Co-Authors: Gong Zhang, Lawrence Chiu, Ling Liu
    Abstract:

    Multi-Tiered Storage systems today are integrating Solid State Disks (SSD) on top of traditional rotational hard disks for performance enhancement due to the significant IO improvements in SSD technology. It is widely recognized that automated data migration between SSD and HDD plays a critical role in effective integration of SSD into multi-Tiered Storage systems. Furthermore, effective data migration has to take into account of application specific workload characteristics, deadlines, and IO profiles. An important and interesting challenge for automated data migration in multi-Tiered Storage systems is how to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-Tiered Storage system. In this paper, we present an adaptive look ahead data migration model that can incorporate application specific characteristics and I/O profiles as well as workload deadlines. Our adaptive data migration model has three unique features. First, it incorporates a set of key factors that may impact on the performance of look ahead migration efficiency in our formal model develop. Second, our data migration model can adaptively determine the optimal look ahead window size, based on several parameters, to optimize the effectiveness of look ahead migration. Third, we formally and experimentally show that the adaptive data migration model can improve overall system performance and resource utilization while meeting workload deadlines. Through our trace driven experimental study, we compare the adaptive look ahead migration approach with the basic migration model and show that the adaptive migration model is effective and efficient for continuously improving and tuning of the performance and scalability of multi-tier Storage systems.

  • MSST - Automated lookahead data migration in SSD-enabled multi-Tiered Storage systems
    2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010
    Co-Authors: Gong Zhang, Lawrence Chiu, Ling Liu, Clem Dickey, Paul H Muench, Sangeetha Seshadri
    Abstract:

    The significant IO improvements of Solid State Disks (SSD) over traditional rotational hard disks makes it an attractive approach to integrate SSDs in Tiered Storage systems for performance enhancement. However, to integrate SSD into multi-Tiered Storage system effectively, automated data migration between SSD and HDD plays a critical role. In many real world application scenarios like banking and supermarket environments, workload and IO profile present interesting characteristics and also bear the constraint of workload deadline. How to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-Tiered Storage system. In this paper, we present an automated, deadline-aware, lookahead migration scheme to address the data migration challenge. We analyze the factors that may impact on the performance of lookahead migration efficiency and develop a greedy algorithm to adaptively determine the optimal lookahead window size to optimize the effectiveness of lookahead migration, aiming at improving overall system performance and resource utilization while meeting workload deadlines. We compare our lookahead migration approach with the basic migration model and validate the effectiveness and efficiency of our adaptive lookahead migration approach through a trace driven experimental study.

Jianzhe Tai - One of the best experts on this subject based on the ideXlab platform.

  • live data migration for reducing sla violations in multi Tiered Storage systems
    IEEE International Conference on Cloud Engineering, 2014
    Co-Authors: Jianzhe Tai, Bo Sheng, Yi Yao
    Abstract:

    Today, the volume of data in the world has been tremendously increased. Large-scaled and diverse data sets are raising new big challenges of Storage, process, and query. Tiered Storage architectures combining solid-state drives (SSDs) with hard disk drives (HDDs), become attractive in enterprise data centers for achieving high performance and large capacity simultaneously. However, how to best use these Storage resources and efficiently manage massive data for providing high quality of service (QoS) is still a core and difficult problem. In this paper, we present a new approach for automated data movement in multi-Tiered Storage systems, which lively migrates the data across different tiers, aiming to support multiple service level agreements (SLAs) for applications with dynamic workloads at the minimal cost. Trace-driven simulations show that compared to the no migration policy, LMsT significantly improves average I/O response times, I/O violation ratios and I/O violation times, with only slight degradation (e.g., up to 6% increase in SLA violation ratio) on the performance of high priority applications.

  • IC2E - Live Data Migration for Reducing SLA Violations in Multi-Tiered Storage Systems
    2014 IEEE International Conference on Cloud Engineering, 2014
    Co-Authors: Jianzhe Tai, Bo Sheng, Yi Yao
    Abstract:

    Today, the volume of data in the world has been tremendously increased. Large-scaled and diverse data sets are raising new big challenges of Storage, process, and query. Tiered Storage architectures combining solid-state drives (SSDs) with hard disk drives (HDDs), become attractive in enterprise data centers for achieving high performance and large capacity simultaneously. However, how to best use these Storage resources and efficiently manage massive data for providing high quality of service (QoS) is still a core and difficult problem. In this paper, we present a new approach for automated data movement in multi-Tiered Storage systems, which lively migrates the data across different tiers, aiming to support multiple service level agreements (SLAs) for applications with dynamic workloads at the minimal cost. Trace-driven simulations show that compared to the no migration policy, LMsT significantly improves average I/O response times, I/O violation ratios and I/O violation times, with only slight degradation (e.g., up to 6% increase in SLA violation ratio) on the performance of high priority applications.

Vinodh Venkatesan - One of the best experts on this subject based on the ideXlab platform.

  • data prefetching for large Tiered Storage systems
    International Conference on Data Mining, 2017
    Co-Authors: Giovanni Cherubini, Yusik Kim, Mark A Lantz, Vinodh Venkatesan
    Abstract:

    In multi-tier Storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower Storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a Storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.

  • ExaPlan: Efficient Queueing-Based Data Placement, Provisioning, and Load Balancing for Large Tiered Storage Systems
    ACM Transactions on Storage, 2017
    Co-Authors: Ilias Iliadis, Slavisa Sarafijanovic, Jens Jelitto, Yusik Kim, Vinodh Venkatesan
    Abstract:

    Multi-Tiered Storage, where each tier consists of one type of Storage device (e.g., SSD, HDD, or disk arrays), is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access characteristics. By aligning the access characteristics of the data, either fixed-sized extents or variable-sized files, to the characteristics of the Storage devices, a higher performance can be achieved for any given cost. This article presents ExaPlan, a method to determine both the data-to-tier assignment and the number of devices in each tier that minimize the system’s mean response time for a given budget and workload. In contrast to other methods that constrain or minimize the system load, ExaPlan directly minimizes the system’s mean response time estimated by a queueing model. Minimizing the mean response time is typically intractable as the resulting optimization problem is both nonconvex and combinatorial in nature. ExaPlan circumvents this intractability by introducing a parameterized data placement approach that makes it a highly scalable method that can be easily applied to exascale systems. Through experiments that use parameters from real-world Storage systems, such as CERN and LOFAR, it is demonstrated that ExaPlan provides solutions that yield lower mean response times than previous works. It supports standalone SSDs and HDDs as well as disk arrays as Storage tiers, and although it uses a static workload representation, we provide empirical evidence that underlying dynamic workloads have invariant properties that can be deemed static for the purpose of provisioning a Storage system. ExaPlan is also effective as a load-balancing tool used for placing data across devices within a tier, resulting in an up to 3.6-fold reduction of response time compared with a traditional load-balancing algorithm, such as the Longest Processing Time heuristic.

  • ICDM - Data Prefetching for Large Tiered Storage Systems
    2017 IEEE International Conference on Data Mining (ICDM), 2017
    Co-Authors: Giovanni Cherubini, Yusik Kim, Mark A Lantz, Vinodh Venkatesan
    Abstract:

    In multi-tier Storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower Storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a Storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.

  • MASCOTS - Performance Evaluation of a Tape Library System
    2016 IEEE 24th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2016
    Co-Authors: Ilias Iliadis, Slavisa Sarafijanovic, Yusik Kim, Vinodh Venkatesan
    Abstract:

    Data with vastly different access characteristics is efficiently stored in multi-Tiered Storage systems. A cost-effective way to retain large volumes of infrequently accessed data is to store it on tape. Steady developments in tape technology deliver ever increasing Storage capacities at low cost. This has established tape as a viable solution to cope with the extreme data growth in the context of Big Data. Assessing the performance of the various tiers is central to achieving appropriate tier dimensioning and Storage provisioning. To that end, we develop an analytical model to evaluate the performance of a tape library system that considers various relevant aspects, such as the number of cartridges and tape drives as well as different mount/unmount policies. Closed-form expressions for the corresponding mean waiting times are derived. The validity of the model developed is confirmed by demonstrating that the predicted performance matches well with that obtained by simulation across a wide range of system parameter values.

  • exaplan queueing based data placement and provisioning for large Tiered Storage systems
    Modeling Analysis and Simulation On Computer and Telecommunication Systems, 2015
    Co-Authors: Ilias Iliadis, Slavisa Sarafijanovic, Jens Jelitto, Yusik Kim, Vinodh Venkatesan
    Abstract:

    Multi-Tiered Storage, where each tier comprises one type of Storage device, e.g., SSD, HDD, is a commonly used approach to achieve both high performance and cost efficiency in large-scale systems that need to store data with vastly different access characteristics. By aligning the access characteristics of the data to the characteristics of the Storage devices, higher performance can be achieved for any given cost. This article presents ExaPlan, a method to determine both the data-to-tier assignment and the number of devices in each tier that minimize the system's mean response time for a given budget and workload. In contrast to other methods that constrain or minimize the system load, ExaPlan directly minimizes the system's mean response time estimated by a queueing model. Minimizing the mean response time is typically intractable as the resulting optimization problem is both non-convex and combinatorial in nature. ExaPlan circumvents this intractability by introducing a parameterized data-placement approach that makes it a highly scalable method that can be easily applied to exascale systems. Through experiments that use parameters from real-world Storage systems, such as CERN and LOFAR, it is demonstrated that ExaPlan provides solutions that yield lower mean response times than previous works. It is also capable of determining a data-to-tier assignment both at the level of files and at the level of fixed-size extents. For some of the workloads evaluated, file-level placement exhibited a significant performance improvement over extent-level placement.