Static Allocation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 17520 Experts worldwide ranked by ideXlab platform

Rajeev Barua - One of the best experts on this subject based on the ideXlab platform.

  • memory Allocation for embedded systems with a compile time unknown scratch pad size
    ACM Transactions in Embedded Computing Systems, 2009
    Co-Authors: Nghi Nguyen, Angel Dominguez, Rajeev Barua
    Abstract:

    This article presents the first memory Allocation scheme for embedded systems having a scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. All existing memory Allocation schemes for SPM require the SPM size to be known at compile time. Unfortunately, because of this constraint, the resulting executable is tied to that size of SPM and is not portable to other processor implementations having a different SPM size. Size-portable code is valuable when programs are downloaded during deployment either via a network or portable media. Code downloads are used for fixing bugs or for enhancing functionality. The presence of different SPM sizes in different devices is common because of the evolution in VLSI technology across years. The result is that SPM cannot be used in such situations with downloaded codes. To overcome this limitation, our work presents a compiler method whose resulting executable is portable across SPMs of any size. Our technique is to employ a customized installer software, which decides the SPM Allocation just before the program's first run, since the SPM size can be discovered at that time. The installer then, based on the decided Allocation, modifies the program executable accordingly. The resulting executable places frequently used objects in SPM, considering both code and data for placement. To keep the overhead low, much of the preprocessing for the Allocation is done at compile time. Results show that our benchmarks average a 41p speedup versus an all-DRAM Allocation, while the optimal Static Allocation scheme, which knows the SPM size at compile time and is thus an unachievable upper-bound and is only slightly faster (45p faster than all-DRAM). Results also show that the overhead from our customized installer averages about 1.5p in code size, 2p in runtime, and 3p in compile time for our benchmarks.

  • dynamic Allocation for scratch pad memory using compile time decisions
    ACM Transactions in Embedded Computing Systems, 2006
    Co-Authors: Sumesh Udayakumaran, Angel Dominguez, Rajeev Barua
    Abstract:

    In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-Allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple Allocation scheme. Primarily scratch pad Allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such Static Allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data Allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic Allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100p predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal Static Allocation, results show that our scheme reduces runtime by up to 39.8p and energy by up to 31.3p, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.

  • memory Allocation for embedded systems with a compile time unknown scratch pad size
    Compilers Architecture and Synthesis for Embedded Systems, 2005
    Co-Authors: Nghi Nguyen, Angel Dominguez, Rajeev Barua
    Abstract:

    This paper presents the first memory Allocation scheme for embedded systems having scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. Its uses are motivated by its better real-time guarantees as compared to cache and by its significantly lower overheads in energy consumption, area and access time.Existing data Allocation schemes for SPM all require that the SPM size be known at compile-time. Unfortunately, the resulting executable is tied to that size of SPM and is not portable to processor implementations having a different SPM size. Such portability would be valuable in situations where programs for an embedded system are not burned into the system at the time of manufacture, but rather are downloaded onto it during deployment, either using a network or portable media such as memory sticks. Such post-deployment code updates are common in distributed networks and in personal hand-held devices. The presence of different SPM sizes in different devices is common because of the evolution in VLSI technology across years. The result is that SPM cannot be used in such situations with downloaded code.To overcome this limitation, this work presents a compiler method whose resulting executable is portable across SPMs of any size. The executable at run-time places frequently used objects in SPM; it considers code, global variables and stack variables for placement in SPM. The Allocation is decided by modified loader software before the program is first run and once the SPM size can be discovered. The loader then modifies the program binary based on the decided Allocation. To keep the overhead low, much of the pre-processing for the Allocation is done at compile-time. Results show that our benchmarks average a 36% speed increase versus an all-DRAM Allocation, while the optimal Static Allocation scheme, which knows the SPM size at compile-time and is thus an un-achievable upper-bound, is only slightly faster (41% faster than all-DRAM). Results also show that the overhead from our embedded loader averages about 1% in both code-size and run-time of our benchmarks.

  • compiler decided dynamic memory Allocation for scratch pad based embedded systems
    Compilers Architecture and Synthesis for Embedded Systems, 2003
    Co-Authors: Sumesh Udayakumaran, Rajeev Barua
    Abstract:

    This paper presents a highly predictable, low overhead and yet dynamic, memory Allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple Allocation scheme [4].Existing scratch-pad Allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitionsm variables at compile-time into the two banks. For example, our previous work in [3] derives a provably optimal Static Allocation for global and stack variables and achieves a speedup over all earlier methods. However, a drawback of such Static Allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data Allocation that never changes at runtime cannot achieve the full locality benefits of a cache.In this paper we present a dynamic Allocation method for global and stack data that for the first time, (i) accounts for changing program requirements at runtime (ii) has no software-caching tags (iii) requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal Static Allocation our results show runtime reductions ranging from 11% to 38%, averaging 31.2%, using no additional hardware support. With hardware support for pseudo-DMA and full DMA, which is already provided in some commercial systems, the runtime reductions increase to 33.4% and 34.2% respectively.

Siva Ram C Murthy - One of the best experts on this subject based on the ideXlab platform.

  • a state space search approach for optimizing reliability and cost of execution in distributed sensor networks
    Journal of Parallel and Distributed Computing, 2009
    Co-Authors: B S Manoj, Archana Sekhar, Siva Ram C Murthy
    Abstract:

    Sensor networks are increasingly being used for applications which require fast processing of data, such as multimedia processing and collaboration among sensors to relay observed data to a base station (BS). Distributed computing can be used on a sensor network to reduce the completion time of a task (an application) and distribute the energy consumption equitably across all sensors, so that certain sensors do not die out faster than the others. The distribution of task modules to sensors should consider not only the time and energy savings, but must also improve reliability of the entire task execution. We formulate the above as an optimization problem, and use the A^* algorithm with improvements to determine an optimal Static Allocation of modules among a set of sensors. We also suggest a faster algorithm, called the greedy A^* algorithm, if a sub-optimal solution is sufficient. Both algorithms have been simulated, and the results have been compared in terms of energy savings, decrease in completion time of the task, and the deviation of the sub-optimal solution from the optimal one. The sub-optimal solution required 8%-35% less computation, at the cost of 2.5%-15% deviation from the optimal solution in terms of average energy spent per sensor node. Both the A^* and greedy A^* algorithms have been shown to distribute energy consumption more uniformly across sensors than centralized execution. The greedy A^* algorithm is found to be scalable, as the number of evaluations in determining the Allocation increases linearly with the number of sensors.

  • a state space search approach for optimizing reliability and cost of execution in distributed sensor networks
    Lecture Notes in Computer Science, 2005
    Co-Authors: Archana Sekhar, B S Manoj, Siva Ram C Murthy
    Abstract:

    Sensor networks are increasingly being used for applications which require fast processing of data, such as multimedia processing. Distributed computing can be used on a sensor network to reduce the completion time of a task and distribute the energy consumption equitably across all sensors. The distribution of task modules to sensors should consider not only the time and energy savings, but must also improve reliability of the entire task execution. We formulate the above as an optimization problem, and use the A* algorithm with improvements to determine an optimal Static Allocation of modules among a set of sensors. We also suggest a faster but suboptimal algorithm, called the greedy A* algorithm. Both algorithms have been simulated, and the results have been compared in terms of energy savings, decrease in completion time of the task, and the deviation of the sub-optimal solution from the optimal one. The sub-optimal solution required 8-35% less computation, at the cost of 2.5-15% deviation from the optimal solution in terms of average energy spent per sensor node. Both the A* and greedy A* algorithms have been shown to distribute energy consumption more uniformly across sensors than centralized execution. The greedy A* algorithm is found to be scalable, as the number of evaluations in determining the Allocation increases linearly with the number of sensors.

Bernabe Dorronsoro - One of the best experts on this subject based on the ideXlab platform.

  • a power efficient genetic algorithm for resource Allocation in cloud computing data centers
    IEEE International Conference on Cloud Networking, 2014
    Co-Authors: Giuseppe Portaluri, Stefano Giordano, Dzmitry Kliazovich, Bernabe Dorronsoro
    Abstract:

    One of the main challenges in cloud computing is to increase the availability of computational resources, while minimizing system power consumption and operational expenses. This article introduces a power efficient resource Allocation algo- rithm for tasks in cloud computing data centers. The developed approach is based on genetic algorithms which ensure perfor- mance and scalability to millions of tasks. Resource Allocation is performed taking into account computational and networking requirements of tasks and optimizes task completion time and data center power consumption. The evaluation results, obtained using a dedicated open source genetic multi-objective framework called jMetal show that the developed approach is able to perform the Static Allocation of a large number of independent tasks on homogeneous single-core servers within the same data center with a quadratic time complexity. Index Terms—Data center, Cloud Computing, Genetic Algo- rithm, Resource Allocation, Power Efficiency.

E Pratsini - One of the best experts on this subject based on the ideXlab platform.

  • dynamic demand fulfillment in spare parts networks with multiple customer classes
    European Journal of Operational Research, 2013
    Co-Authors: Hgh Harold Tiemessen, Moritz Fleischmann, Van Geertjan Geertjan Houtum, Van Jaee Jo Nunen, E Pratsini
    Abstract:

    We study real-time demand fulfillment for networks consisting of multiple local warehouses, where spare parts of expensive technical systems are kept on stock for customers with different service contracts. Each service contract specifies a maximum response time in case of a failure and hourly penalty costs for contract violations. Part requests can be fulfilled from multiple local warehouses via a regular delivery, or from an external source with ample capacity via an expensive emergency delivery. The objective is to minimize delivery cost and penalty cost by smartly allocating items from the available network stock to arriving part requests. We propose a dynamic Allocation rule that belongs to the class of one-step lookahead policies. To approximate the optimal relative cost, we develop an iterative calculation scheme that estimates the expected total cost over an infinite time horizon, assuming that future demands are fulfilled according to a simple Static Allocation rule. In a series of numerical experiments, we compare our dynamic Allocation rule with the optimal Allocation rule, and a simple but widely used Static Allocation rule. We show that the dynamic Allocation rule has a small optimality gap and that it achieves an average cost reduction of 7.9% compared to the Static Allocation rule on a large test bed containing problem instances of real-life size.

  • dynamic demand fulfillment in spare parts networks with multiple customer classes
    European Journal of Operational Research, 2013
    Co-Authors: Hgh Harold Tiemessen, Moritz Fleischmann, Van Geertjan Geertjan Houtum, Van Jaee Jo Nunen, E Pratsini
    Abstract:

    We study real-time demand fulfillment for networks consisting of multiple local warehouses, where spare parts of expensive technical systems are kept on stock for customers with different service contracts. Each service contract specifies a maximum response time in case of a failure and hourly penalty costs for contract violations. Part requests can be fulfilled from multiple local warehouses via a regular delivery, or from an external source with ample capacity via an expensive emergency delivery. The objective is to minimize delivery cost and penalty cost by smartly allocating items from the available network stock to arriving part requests. We propose a dynamic Allocation rule that belongs to the class of one-step lookahead policies. To approximate the optimal relative cost, we develop an iterative calculation scheme that estimates the expected total cost over an infinite time horizon, assuming that future demands are fulfilled according to a simple Static Allocation rule. In a series of numerical experiments, we compare our dynamic Allocation rule with the optimal Allocation rule, and a simple but widely used Static Allocation rule. We show that the dynamic Allocation rule has a small optimality gap and that it achieves an average cost reduction of 7.9% compared to the Static Allocation rule on a large test bed containing problem instances of real-life size.

Archana Sekhar - One of the best experts on this subject based on the ideXlab platform.

  • a state space search approach for optimizing reliability and cost of execution in distributed sensor networks
    Journal of Parallel and Distributed Computing, 2009
    Co-Authors: B S Manoj, Archana Sekhar, Siva Ram C Murthy
    Abstract:

    Sensor networks are increasingly being used for applications which require fast processing of data, such as multimedia processing and collaboration among sensors to relay observed data to a base station (BS). Distributed computing can be used on a sensor network to reduce the completion time of a task (an application) and distribute the energy consumption equitably across all sensors, so that certain sensors do not die out faster than the others. The distribution of task modules to sensors should consider not only the time and energy savings, but must also improve reliability of the entire task execution. We formulate the above as an optimization problem, and use the A^* algorithm with improvements to determine an optimal Static Allocation of modules among a set of sensors. We also suggest a faster algorithm, called the greedy A^* algorithm, if a sub-optimal solution is sufficient. Both algorithms have been simulated, and the results have been compared in terms of energy savings, decrease in completion time of the task, and the deviation of the sub-optimal solution from the optimal one. The sub-optimal solution required 8%-35% less computation, at the cost of 2.5%-15% deviation from the optimal solution in terms of average energy spent per sensor node. Both the A^* and greedy A^* algorithms have been shown to distribute energy consumption more uniformly across sensors than centralized execution. The greedy A^* algorithm is found to be scalable, as the number of evaluations in determining the Allocation increases linearly with the number of sensors.

  • a state space search approach for optimizing reliability and cost of execution in distributed sensor networks
    Lecture Notes in Computer Science, 2005
    Co-Authors: Archana Sekhar, B S Manoj, Siva Ram C Murthy
    Abstract:

    Sensor networks are increasingly being used for applications which require fast processing of data, such as multimedia processing. Distributed computing can be used on a sensor network to reduce the completion time of a task and distribute the energy consumption equitably across all sensors. The distribution of task modules to sensors should consider not only the time and energy savings, but must also improve reliability of the entire task execution. We formulate the above as an optimization problem, and use the A* algorithm with improvements to determine an optimal Static Allocation of modules among a set of sensors. We also suggest a faster but suboptimal algorithm, called the greedy A* algorithm. Both algorithms have been simulated, and the results have been compared in terms of energy savings, decrease in completion time of the task, and the deviation of the sub-optimal solution from the optimal one. The sub-optimal solution required 8-35% less computation, at the cost of 2.5-15% deviation from the optimal solution in terms of average energy spent per sensor node. Both the A* and greedy A* algorithms have been shown to distribute energy consumption more uniformly across sensors than centralized execution. The greedy A* algorithm is found to be scalable, as the number of evaluations in determining the Allocation increases linearly with the number of sensors.