Underlying File - Explore the Science & Experts

The Experts below are selected from a list of 13068 Experts worldwide ranked by ideXlab platform

Youngjae Kim - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

IEEE Transactions on Parallel and Distributed Systems, 2020

Co-Authors: Hyogi Sim, Awais Khan, Sudharshan S. Vazhkudai, Seung-hwan Lim, Ali R. Butt, Youngjae Kim

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this article, we present TagIt, a scalable data management service framework aimed at scientific datasets, which can be integrated into prevalent distributed File system architectures. A key feature of TagIt is a scalable, distributed metadata indexing framework, which facilitates a flexible tagging capability to support data discovery. Furthermore, the tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. We have integrated TagIt into two popular distributed File systems, i.e., GlusterFS and CephFS. Our evaluation demonstrates that TagIt can expedite data search operation by up to 10× over the extant decoupled approach.

15 days free trial to Access Article
SC - Tagit: an integrated indexing and search service for File systems

Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis, 2017

Co-Authors: Hyogi Sim, Sudharshan S. Vazhkudai, Seung-hwan Lim, Youngjae Kim, Geoffroy Vallée, Ali R. Butt

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed File system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10X over the extant decoupled approach.

15 days free trial to Access Article

Ali R. Butt - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

IEEE Transactions on Parallel and Distributed Systems, 2020

Co-Authors: Hyogi Sim, Awais Khan, Sudharshan S. Vazhkudai, Seung-hwan Lim, Ali R. Butt, Youngjae Kim

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this article, we present TagIt, a scalable data management service framework aimed at scientific datasets, which can be integrated into prevalent distributed File system architectures. A key feature of TagIt is a scalable, distributed metadata indexing framework, which facilitates a flexible tagging capability to support data discovery. Furthermore, the tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. We have integrated TagIt into two popular distributed File systems, i.e., GlusterFS and CephFS. Our evaluation demonstrates that TagIt can expedite data search operation by up to 10× over the extant decoupled approach.

15 days free trial to Access Article
SC - Tagit: an integrated indexing and search service for File systems

Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis, 2017

Co-Authors: Hyogi Sim, Sudharshan S. Vazhkudai, Seung-hwan Lim, Youngjae Kim, Geoffroy Vallée, Ali R. Butt

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed File system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10X over the extant decoupled approach.

15 days free trial to Access Article

Ajay Mohindra - One of the best experts on this subject based on the ideXlab platform.

server recovery using naturally replicated state a case study

International Conference on Distributed Computing Systems, 1995

Co-Authors: Murthy V Devarakonda, B Kish, Ajay Mohindra

Abstract:

This paper describes design and preliminary measurements of a File server recovery scheme that uses naturally replicated state among clients. This scheme, implemented in the Calypso File system, is truly transparent to the user and avoids the overhead of explicit replication. A three-phase protocol reconstructs the server state either on a backup node (if disks are multi-ported) or on the rebooted server node. Measurements show that the recovery time is about 21 seconds for a busy 10-node cluster. However, the time to rebuild the distributed state is only about 1.5 seconds, and most of the recovery time is spent in replaying the write-ahead log of the Underlying File system. Fortunately, the log redo time is bounded by the log size.

15 days free trial to Access Article
ICDCS - Server recovery using naturally replicated state: a case study

Proceedings of 15th International Conference on Distributed Computing Systems, 1

Co-Authors: Murthy V Devarakonda, B Kish, Ajay Mohindra

Abstract:

This paper describes design and preliminary measurements of a File server recovery scheme that uses naturally replicated state among clients. This scheme, implemented in the Calypso File system, is truly transparent to the user and avoids the overhead of explicit replication. A three-phase protocol reconstructs the server state either on a backup node (if disks are multi-ported) or on the rebooted server node. Measurements show that the recovery time is about 21 seconds for a busy 10-node cluster. However, the time to rebuild the distributed state is only about 1.5 seconds, and most of the recovery time is spent in replaying the write-ahead log of the Underlying File system. Fortunately, the log redo time is bounded by the log size.

15 days free trial to Access Article

Hyogi Sim - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

IEEE Transactions on Parallel and Distributed Systems, 2020

Co-Authors: Hyogi Sim, Awais Khan, Sudharshan S. Vazhkudai, Seung-hwan Lim, Ali R. Butt, Youngjae Kim

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this article, we present TagIt, a scalable data management service framework aimed at scientific datasets, which can be integrated into prevalent distributed File system architectures. A key feature of TagIt is a scalable, distributed metadata indexing framework, which facilitates a flexible tagging capability to support data discovery. Furthermore, the tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. We have integrated TagIt into two popular distributed File systems, i.e., GlusterFS and CephFS. Our evaluation demonstrates that TagIt can expedite data search operation by up to 10× over the extant decoupled approach.

15 days free trial to Access Article
SC - Tagit: an integrated indexing and search service for File systems

Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis, 2017

Co-Authors: Hyogi Sim, Sudharshan S. Vazhkudai, Seung-hwan Lim, Youngjae Kim, Geoffroy Vallée, Ali R. Butt

Abstract:

Data services such as search, discovery, and management in scalable distributed environments have traditionally been decoupled from the Underlying File systems, and are often deployed using external databases and indexing services. However, modern data production rates, looming data movement costs, and the lack of metadata, entail revisiting the decoupled File system-data services design philosophy. In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed File system. A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery. The tags can also be associated with an active operator, for pre-processing, filtering, or automatic metadata extraction, which we seamlessly offload to File servers in a load-aware fashion. Our evaluation shows that TagIt can expedite data search by up to 10X over the extant decoupled approach.

15 days free trial to Access Article

Robert Ross - One of the best experts on this subject based on the ideXlab platform.

Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing, 2014

Co-Authors: Thomas Ilsche, Robert Ross, Joseph Schuchart, Jason Cope, Dries Kimpe, Terry Jones, Andreas Knüpfer, Kamil Iskra, Wolfgang E. Nagel, Stephen Poole

Abstract:

Programming development tools are a vital component for understanding the behavior of parallel applications. Event tracing is a principal ingredient to these tools, but new and serious challenges place event tracing at risk on extreme-scale machines. As the quantity of captured events increases with concurrency, the additional data can overload the parallel File system and perturb the application being observed. In this work we present a solution for event tracing on extreme-scale machines. We enhance an I/O forwarding software layer to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the Underlying File system. Furthermore, we introduce a sophisticated write buffering capability to limit the impact. To validate the approach, we employ the Vampir tracing toolset using these new capabilities. Our results demonstrate that the approach increases the maximum traced application size by a factor of 5× to more than 200,000 processes.

15 days free trial to Access Article
SC - Characterization and modeling of PIDX parallel I/O for performance optimization

Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis on - SC '13, 2013

Co-Authors: Sidharth Kumar, Robert Latham, Avishek Saha, Venkatram Vishwanath, Philip Carns, John A. Schmidt, Giorgio Scorzelli, Hemanth Kolla, Ray W. Grout, Robert Ross

Abstract:

Parallel I/O library performance can vary greatly in response to user-tunable parameter values such as aggregator count, File count, and aggregation strategy. Unfortunately, manual selection of these values is time consuming and dependent on characteristics of the target machine, the Underlying File system, and the dataset itself. Some characteristics, such as the amount of memory per core, can also impose hard constraints on the range of viable parameter values. In this work we address these problems by using machine learning techniques to model the performance of the PIDX parallel I/O library and select appropriate tunable parameter values. We characterize both the network and I/O phases of PIDX on a Cray XE6 as well as an IBM Blue Gene/P system. We use the results of this study to develop a machine learning model for parameter space exploration and performance prediction.

15 days free trial to Access Article
HPDC - Enabling event tracing at leadership-class scale through I/O forwarding middleware

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12, 2012

Co-Authors: Thomas Ilsche, Robert Ross, Joseph Schuchart, Jason Cope, Dries Kimpe, Terry Jones, Andreas Knüpfer, Kamil Iskra, Wolfgang E. Nagel, Stephen W. Poole

Abstract:

Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel File system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the Underlying File system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.

15 days free trial to Access Article
on the duality of data intensive File system design reconciling hdfs and pvfs

IEEE International Conference on High Performance Computing Data and Analytics, 2011

Co-Authors: Wittawat Tantisiriroj, Swapnil Patil, Samuel Lang, Garth A Gibson, Robert Ross

Abstract:

Data-intensive applications fall into two computing styles: Internet services (cloud computing) or high-performance computing (HPC). In both categories, the Underlying File system is a key component for scalable application performance. In this paper, we explore the similarities and differences between PVFS, a parallel File system used in HPC at large scale, and HDFS, the primary storage system used in cloud computing with Hadoop. We integrate PVFS into Hadoop and compare its performance to HDFS using a set of data-intensive computing benchmarks. We study how HDFS-specific optimizations can be matched using PVFS and how consistency, durability, and persistence tradeoffs made by these File systems affect application performance. We show how to embed multiple replicas into a PVFS File, including a mapping with a complete copy local to the writing client, to emulate HDFS's File layout policies. We also highlight implementation issues with HDFS's dependence on disk bandwidth and benefits from pipelined replication.

15 days free trial to Access Article
PVM/MPI - Implementing MPI-IO shared File pointers without File system support

Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Co-Authors: Robert Latham, Robert Ross, Rajeev Thakur, Brian Toonen

Abstract:

The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of any number of different Underlying storage targets. These targets vary widely in their capabilities, and in some cases additional effort is needed within ROMIO to support all MPI-IO semantics. The MPI-2 standard defines a class of File access routines that use a shared File pointer. These routines require communication internal to the MPI-IO implementation in order to allow processes to atomically update this shared value. We discuss a technique that leverages MPI-2 one-sided operations and can be used to implement this concept without requiring any features from the Underlying File system. We then demonstrate through a simulation that our algorithm adds reasonable overhead for independent accesses and very small overhead for collective accesses.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Underlying File with ideXlab!

Youngjae Kim - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

SC - Tagit: an integrated indexing and search service for File systems

Ali R. Butt - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

SC - Tagit: an integrated indexing and search service for File systems

Ajay Mohindra - One of the best experts on this subject based on the ideXlab platform.

server recovery using naturally replicated state a case study

ICDCS - Server recovery using naturally replicated state: a case study

Hyogi Sim - One of the best experts on this subject based on the ideXlab platform.

An Integrated Indexing and Search Service for Distributed File Systems

SC - Tagit: an integrated indexing and search service for File systems

Robert Ross - One of the best experts on this subject based on the ideXlab platform.

Optimizing I/O forwarding techniques for extreme-scale event tracing

SC - Characterization and modeling of PIDX parallel I/O for performance optimization

HPDC - Enabling event tracing at leadership-class scale through I/O forwarding middleware

on the duality of data intensive File system design reconciling hdfs and pvfs

PVM/MPI - Implementing MPI-IO shared File pointers without File system support