Google File System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7320 Experts worldwide ranked by ideXlab platform

Sanjay Ghemawat - One of the best experts on this subject based on the ideXlab platform.

  • The Google File System
    Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03, 2003
    Co-Authors: Sanjay Ghemawat, Howard Gobioff, Shun-tak Leung
    Abstract:

    We have designed and implemented the Google File System, a scalable distributed File System for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed File Systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier File System assumptions. This has led us to reexamine traditional choices and explore radically different design points. The File System has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present File System interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

  • SOSP - The Google File System
    Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03, 2003
    Co-Authors: Sanjay Ghemawat, Howard Gobioff, Shun-tak Albert Leung
    Abstract:

    We have designed and implemented the Google File System, a scalable distributed File System for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed File Systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier File System assumptions. This has led us to reexamine traditional choices and explore radically different design points. The File System has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present File System interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

Shun-tak Leung - One of the best experts on this subject based on the ideXlab platform.

  • The Google File System
    Proceedings of the nineteenth ACM symposium on Operating systems principles - SOSP '03, 2003
    Co-Authors: Sanjay Ghemawat, Howard Gobioff, Shun-tak Leung
    Abstract:

    We have designed and implemented the Google File System, a scalable distributed File System for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed File Systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier File System assumptions. This has led us to reexamine traditional choices and explore radically different design points. The File System has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present File System interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

Yongxin Zhao - One of the best experts on this subject based on the ideXlab platform.

  • modeling and verifying Google File System
    High-Assurance Systems Engineering, 2015
    Co-Authors: Mengdi Wang, Yongxin Zhao, Huibiao Zhu, Fu Song
    Abstract:

    Google File System (GFS) is a distributed File System developed by Google for massive data-intensive applications. Its high aggregate performance of delivering massive data to many clients but the inexpensiveness of commodity hardware facilitate GFS to successfully meet the massive storage needs and be widely used in industries. In this paper, we first present a formal model of Google File System in terms of Communicating Sequential Processes (CSP#), which precisely describes the un-derlying read/write behaviors of GFS. On that basis, both relaxed consistency and eventually consistency guaranteed by GFS may be revealed in our framework. Furthermore, the suggested CSP# model is encoded in Process Analysis Toolkit (PAT), thus several properties such as starvation-free and deadlock-free could be automatically checked and verified in the framework of formal methods.

  • HASE - Modeling and Verifying Google File System
    2015 IEEE 16th International Symposium on High Assurance Systems Engineering, 2015
    Co-Authors: Mengdi Wang, Yongxin Zhao, Huibiao Zhu, Fu Song
    Abstract:

    Google File System (GFS) is a distributed File System developed by Google for massive data-intensive applications. Its high aggregate performance of delivering massive data to many clients but the inexpensiveness of commodity hardware facilitate GFS to successfully meet the massive storage needs and be widely used in industries. In this paper, we first present a formal model of Google File System in terms of Communicating Sequential Processes (CSP#), which precisely describes the un-derlying read/write behaviors of GFS. On that basis, both relaxed consistency and eventually consistency guaranteed by GFS may be revealed in our framework. Furthermore, the suggested CSP# model is encoded in Process Analysis Toolkit (PAT), thus several properties such as starvation-free and deadlock-free could be automatically checked and verified in the framework of formal methods.

  • formalizing Google File System
    Pacific Rim International Symposium on Dependable Computing, 2014
    Co-Authors: Mengdi Wang, Yongxin Zhao
    Abstract:

    Google File System (GFS) is a distributed File System developed by Google for massive data-intensive applications which is widely used in industries nowadays. In this paper, we present a formal model of Google File System in terms of Communicating Sequential Processes (CSP#), which precisely describes the underlying read/write behaviours of GFS. Based on the achieved model some properties like deadlock-free, and consistency model of GFS can be analyzed and verified in the further work.

  • PRDC - Formalizing Google File System
    2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing, 2014
    Co-Authors: Mengdi Wang, Yongxin Zhao
    Abstract:

    Google File System (GFS) is a distributed File System developed by Google for massive data-intensive applications which is widely used in industries nowadays. In this paper, we present a formal model of Google File System in terms of Communicating Sequential Processes (CSP#), which precisely describes the underlying read/write behaviours of GFS. Based on the achieved model some properties like deadlock-free, and consistency model of GFS can be analyzed and verified in the further work.

Pierre Sens - One of the best experts on this subject based on the ideXlab platform.

  • Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2017
    Co-Authors: Wen Sun, Véronique Simon, Sébastien Monnet, Philippe Robert, Pierre Sens
    Abstract:

    Distributed storage Systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. Persistence is achieved by replicating the same data block on several nodes, and ensuring that a minimum number of copies are available on the System at any time. Whenever the contents of a node are lost, for instance due to a hard disk crash, the System regenerates the data blocks stored before the failure by transferring them from the remaining replicas. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given File at some server. The variability of the loads of the nodes of the network is investigated for several policies. Three replication mechanisms are tested against simulations in the context of a real implementation of a such a System: Random, Least Loaded and Power of Choice. The simulations show that some of these policies may lead to quite unbalanced situations: if β is the average number of copies per node it turns out that, at equilibrium, the load of the nodes may exhibit a high variability. It is shown in this paper that a simple variant of a power of choice type algorithm has a striking effect on the loads of the nodes: at equilibrium, the distribution of the load of a node has a bounded support, most of nodes have a load less than 2β which is an interesting property for the design of the storage space of these Systems. Stochastic models are introduced and investigated to explain this interesting phenomenon.

  • Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems: A Mean-Field Approach
    2017
    Co-Authors: Wen Sun, Véronique Simon, Sébastien Monnet, Philippe Robert, Pierre Sens
    Abstract:

    Distributed storage Systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. Persistence is achieved by replicating the same data block on several nodes, and ensuring that a minimum number of copies are available on the System at any time. Whenever the contents of a node are lost, for instance due to a hard disk crash, the System regenerates the data blocks stored before the failure by transferring them from the remaining replicas. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given File at some server. The variability of the loads of the nodes of the network is investigated for several policies. Three replication mechanisms are tested against simulations in the context of a real implementation of a such a System: Random, Least Loaded and Power of Choice. The simulations show that some of these policies may lead to quite unbalanced situations: if $\beta$ is the average number of copies per node it turns out that, at equilibrium, the load of the nodes may exhibit a high variability. It is shown in this paper that a simple variant of a power of choice type algorithm has a striking effect on the loads of the nodes: at equilibrium, the distribution of the load of a node has a bounded support, most of nodes have a load less than $2\beta$ which is an interesting property for the design of the storage space of these Systems. Mathematical models are introduced and investigated to explain this interesting phenomenon. The analysis of these Systems turns out to be quite complicated mainly because of the large dimensionality of the state spaces involved. Our study relies on probabilistic methods, mean-field analysis, to analyze the asymptotic behavior of an arbitrary node of the network when the total number of nodes gets large. An additional ingredient is the use of stochastic calculus with marked Poisson point processes to establish some of our results.

  • Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems: A Mean-Field Approach
    arXiv: Distributed Parallel and Cluster Computing, 2017
    Co-Authors: Wen Sun, Véronique Simon, Sébastien Monnet, Philippe Robert, Pierre Sens
    Abstract:

    Distributed storage Systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given File at some server. The variability of the loads of the nodes of the network is investigated for several policies. Three replication mechanisms are tested against simulations in the context of a real implementation of a such a System: Random, Least Loaded and Power of Choice. The simulations show that some of these policies may lead to quite unbalanced situations: if $\beta$ is the average number of copies per node it turns out that, at equilibrium, the load of the nodes may exhibit a high variability. It is shown in this paper that a simple variant of a power of choice type algorithm has a striking effect on the loads of the nodes: at equilibrium, the distribution of the load of a node has a bounded support, most of nodes have a load less than $2\beta$ which is an interesting property for the design of the storage space of these Systems. Mathematical models are introduced and investigated to explain this interesting phenomenon. The analysis of these Systems turns out to be quite complicated mainly because of the large dimensionality of the state spaces involved. Our study relies on probabilistic methods, mean-field analysis, to analyze the asymptotic behavior of an arbitrary node of the network when the total number of nodes gets large. An additional ingredient is the use of stochastic calculus with marked Poisson point processes to establish some of our results.

  • SIGMETRICS (Abstracts) - Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems: A Mean-Field Approach
    Proceedings of the 2017 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems - SIGMETRICS '17 Abstracts, 2017
    Co-Authors: Wen Sun, Véronique Simon, Sébastien Monnet, Philippe Robert, Pierre Sens
    Abstract:

    Distributed storage Systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. Persistence is achieved by replicating the same data block on several nodes, and ensuring that a minimum number of copies are available on the System at any time. Whenever the contents of a node are lost, for instance due to a hard disk crash, the System regenerates the data blocks stored before the failure by transferring them from the remaining replicas. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given File at some server. The variability of the loads of the nodes of the network is investigated for several policies. Three replication mechanisms are tested against simulations in the context of a real implementation of a such a System: Random, Least Loaded and Power of Choice. The simulations show that some of these policies may lead to quite unbalanced situations: if s is the average number of copies per node it turns out that, at equilibrium, the load of the nodes may exhibit a high variability. It is shown in this paper that a simple variant of a power of choice type algorithm has a striking effect on the loads of the nodes: at equilibrium, the distribution of the load of a node has a bounded support, most of nodes have a load less than 2s which is an interesting property for the design of the storage space of these Systems. Stochastic models are introduced and investigated to explain this interesting phenomenon.

Aleksandar Sokolovski - One of the best experts on this subject based on the ideXlab platform.

  • ICT business value of using a cloud based System in an enterprise environment, the case of Microsoft 365
    2012
    Co-Authors: Jelena Gjorgjev, Saso Gelev, Milena Zivadinovik, Aleksandar Sokolovski
    Abstract:

    Област на истражување на овој труд е една доста важна ИКТ технологија, технологијата која ја овозможува работата на интернтет базираните Cloud Based Systems (CDS), се анализира нивната можна бизнис вредност во корпоративниот свет и околина. Оваа технологија е клучна за работењето на било која ИКТ компанија во денешното информатичко општество. Овој труд ќе се обиде да ги истражи можните бенефити од користењето на CDS и како истите од аспект на фукнционалност можат да се подобрат со цел да бидат по ефикасни и да можат да заменат дел од фукцнионалностите на денешните offline системи. Главната истражувачка цел ќе биде да се проучи Microsoft 365 и истиот да се спореди од аспект на фукнционалност со најкористените постоечките CDS (Dropbox, SkyDrive, Google File System,) Примарната цел е да се најде или предложат фунционалностите кои ги имаат другите системи а недостастуваат во Microsoft 365, со цел да се подобри истиот CDS за бизнис светот.

  • Usability aspect of the cloud solutions in used in enterprise environment, the case of Microsoft 365
    2012
    Co-Authors: Jelena Gjorgjev, Saso Gelev, Martin Milosavljev-apostolovski, Aleksandar Sokolovski
    Abstract:

    Област на истражување на овој труд е една доста важна ИКТ технологија, технологијата која ја овозможува работата на интернтет базираните Cloud Based Systems (CDS), се анализира нивната можна бизнис вредност во корпоративниот свет и околина. Оваа технологија е клучна за работењето на било која ИКТ компанија во денешното информатичко oпштество. Овој труд ќе се обиде да ги истражи можните бенефити од користењето на CDS и како истите од аспект на фукнционалност можат да се подобрат со цел да бидат по ефикасни и да можат да заменат дел од фукцнионалностите на денешните offline системи. Главната истражувачка цел ќе биде да се проучи Microsoft 365 и истиот да се спореди од аспект на фукнционалност со најкористените постоечките CDS (Dropbox, SkyDrive, Google File System,) Примарната цел е да се најде или предложат фунционалностите кои ги имаат другите системи а недостастуваат во Microsoft 365, со цел да се подобри истиот CDS за бизнис светот.

  • Information System proposal for cloud based File System
    2011
    Co-Authors: Aleksandar Sokolovski, Saso Gelev
    Abstract:

    Област на истражување на овој труд е една доста важна ИКТ технологија, технологијата која ја овозможува работата на интернтет базираните системи на датотеки Cloud Based File Systems (CDFS) . Оваа технологија е клучна за работењето на било која ИКТ компанија во денешното информатичко општество. Овој труд ќе се обиде да ги истражи можните бенефити од користењето на CDFS и да предложи нов алгоритам за принципот за размена на датотеки помеѓу повеќе корисници (multiple-read, single-write, File sharing principal, MRSW). Главната истражувачка цел ќе биде да се проучат неколку од најкористените постоечките CDFS (Dropbox, SkyDrive, Google File System, Folio Cloud) и истите да се споредат со традиционалните системи за размена на датотеки (File Transfer Protocol based Systems). Примарната цел е да се најде или предложи најдобриот можен алгоритам за MRSW. Ова ќе се потврди со тестирање на предложениот алгоритам и споредба на резулатите со веќе постоечки системи за размена на датотеки (bencmarking).