Fault Tolerance Mechanism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 354 Experts worldwide ranked by ideXlab platform

Itsujiro Arita - One of the best experts on this subject based on the ideXlab platform.

  • evaluating low cost Fault Tolerance Mechanism for microprocessors on multimedia applications
    Pacific Rim International Symposium on Dependable Computing, 2001
    Co-Authors: Toshinori Sato, Itsujiro Arita
    Abstract:

    We evaluate a low-cost Fault-Tolerance Mechanism for microprocessors, which can detect and recover from transient Faults, using multimedia applications. There are two driving forces to study Fault-Tolerance techniques for microprocessors. One is deep submicron fabrication technologies. Future semiconductor technologies could become more susceptible to alpha particles and other cosmic radiation. The other is the increasing popularity of mobile platforms. Recently cell phones have been used for applications which are critical to our financial security, such as flight ticket reservation, mobile banking, and mobile trading. In such applications, it is expected that computer systems will always work correctly. From these observations, we propose a Mechanism which is based on an instruction reissue technique for incorrect data speculation recovery which utilizes time redundancy. Unfortunately, we found significant performance loss when we evaluated the proposal using the SPEC2000 benchmark suite. We evaluate it using MediaBench which contains more practical mobile applications than SPEC2000.

Jinho Ahn - One of the best experts on this subject based on the ideXlab platform.

  • lightweight Fault Tolerance Mechanism for distributed mobile agent based monitoring
    Consumer Communications and Networking Conference, 2009
    Co-Authors: Jinho Ahn
    Abstract:

    Thanks to asynchronous and dynamic natures of mobile agents, a certain number of mobile agent-based monitoring Mechanisms have actively been developed to monitor large scale and dynamic distributed networked systems adaptively and efficiently. Among them, some Mechanisms attempt to adapt to dynamic changes in various aspects such as network traffic patterns, resource addition and deletion, network topology and so on. However, failures of some domain managers are very critical to providing correct, real-time and efficient monitoring functionality in a large-scale mobile agent-based distributed monitoring system. In this paper, we present a novel Fault-Tolerance Mechanism to have the following advantageous features appropriate for large-scale and dynamic hierarchical mobile agent-based monitoring organizations. It supports fast failure detection functionality with low failure-free overhead by each domain manager transmitting heart-beat messages to its immediate higher-level manager. Also, it minimizes the number of non-Faulty monitoring managers affected by failures of domain managers. Moreover, it allows consistent failure detection actions to be performed continuously in case of agent creation, migration and termination, and is able to execute consistent takeover actions even in concurrent failures of domain managers.

Toshinori Sato - One of the best experts on this subject based on the ideXlab platform.

  • evaluating low cost Fault Tolerance Mechanism for microprocessors on multimedia applications
    Pacific Rim International Symposium on Dependable Computing, 2001
    Co-Authors: Toshinori Sato, Itsujiro Arita
    Abstract:

    We evaluate a low-cost Fault-Tolerance Mechanism for microprocessors, which can detect and recover from transient Faults, using multimedia applications. There are two driving forces to study Fault-Tolerance techniques for microprocessors. One is deep submicron fabrication technologies. Future semiconductor technologies could become more susceptible to alpha particles and other cosmic radiation. The other is the increasing popularity of mobile platforms. Recently cell phones have been used for applications which are critical to our financial security, such as flight ticket reservation, mobile banking, and mobile trading. In such applications, it is expected that computer systems will always work correctly. From these observations, we propose a Mechanism which is based on an instruction reissue technique for incorrect data speculation recovery which utilizes time redundancy. Unfortunately, we found significant performance loss when we evaluated the proposal using the SPEC2000 benchmark suite. We evaluate it using MediaBench which contains more practical mobile applications than SPEC2000.

Wei-jen Wang - One of the best experts on this subject based on the ideXlab platform.

  • A Fault Tolerance Mechanism for Semiconductor Equipment Monitoring
    2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2), 2017
    Co-Authors: Shao-jui Chen, Wei-jen Wang
    Abstract:

    As the semiconductor manufacturing technology advances, the size of a wafer becomes bigger and the critical dimension becomes smaller than before. This means a wafer can be used to produce more chips. However, the process of manufacturing chips is costly while using today's semiconductor manufacturing technology. Any defect on the wafer may fail the final product and cause large business loss. To reduce the chance of defects on the wafer, the parameters of the manufacturing environment must be precisely controlled. To achieve this goal, a monitoring system is usually used to collect real-time information, which helps shorten the decision time for changing the parameters of the manufacturing environment. For now, most of the semiconductor manufacturing machines support the SECS/GEM standard, which defines how to obtain the monitoring data of the machines via TCP/IP. The problem is that, the existing monitoring approach rarely supports failover and needs human intervention when the system crashes. This implies a long recovery time. Moreover, the failure may further cause other problems. For example, a manufacturing alarm system could generate a false alarm or overlook an important abnormality during the failure time, since the monitoring system fails to feed any data to the alarm system. To solve this problem, we introduce a new Fault-Tolerance monitoring Mechanism based on the techniques of server redundancy and checkpointing. With the proposed approach, the monitoring system is able to achieve a very small downtime, and consequently helps the manufacturing process and the yield rate.

  • virtual machines of high availability using hardware assisted failure detection
    International Carnahan Conference on Security Technology, 2015
    Co-Authors: Wei-jen Wang, Shao-jui Chen, Hunglin Huang, Shanhao Chuang, Deron Liang
    Abstract:

    The virtualization technology has been widely used in today's doud computing datacenters. With the virtualization technology, each physical machine in a datacenter can be logically divided into several virtual machines, on which different types of software services can host. However, many reasons may decrease the availability of the whole system. For example, a failed physical machine automatically fails all virtual machines on the physical machine, and consequently fails every software service on the virtual machines. It is difficult to detect failures efficiently in a general-purpose computer architecture because the hardware cannot provide enough information for fast failure detection. On the contrary, the ATCA (Advanced Telecommunications Computing Architecture) physical machines provide high hardware availability, and support IPMI (Intelligent Platform Management Interface) that can quickly detect the hardware status. In this paper, we developed a novel failure model and designed a symmetric Fault-tolerant Mechanism using ATCA physical machines and KVM to provide a solution for high system availability. The proposed Fault-tolerant Mechanism divides ATCA physical machines into pairs, such that each machine of a pair supports Fault Tolerance for each other. Once a failure is detected in the physical machine layer or the virtualization layer, the failed virtual machines are then recovered on the other physical machine. We have compared the proposed Fault-Tolerance Mechanism with another prior VM-based Fault-Tolerance tool. The results show that the proposed Mechanism significantly reduces the service downtime. That is, it provides better system availability for software services running on the virtual machines.

Weiwei Lin - One of the best experts on this subject based on the ideXlab platform.

  • a dynamic data Fault Tolerance Mechanism for cloud storage
    International Conference on Emerging Intelligent Data and Web Technologies, 2013
    Co-Authors: Bo Liu, Weiwei Lin
    Abstract:

    With the increasing of data in the network, and mainly using the Fault-Tolerance of replication, the Cloud Storage will be hard to satisfied requirements of storing all data. To improve the data Fault-Tolerance of cloud storage, we propose a dynamic data Fault-Tolerance for cloud storage (DDFMCS) that dynamically determines the data Fault Tolerance Mechanisms by the file access frequency ratio stored in the file access frequency table, the number of file Fault-Tolerance conversions and the time of files stored in the system. We implement DDFMCS based on Hadoop and conduct some experiments. Experimental result shows that DDFMCS can increase the utilization of cloud storage space and improve data access performance.