Scratchpad Memory

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1398 Experts worldwide ranked by ideXlab platform

Jingling Xue - One of the best experts on this subject based on the ideXlab platform.

  • Scratchpad Memory aware task scheduling with minimum number of preemptions on a single processor
    Asia and South Pacific Design Automation Conference, 2013
    Co-Authors: Qing Wan, Jingling Xue
    Abstract:

    We propose a unified approach to the problem of scheduling a set of tasks with individual release times, deadlines and precedence constraints, and allocating the data of each task to the SPM (Scratchpad Memory) on a single processor system. Our approach consists of a task scheduling algorithm and an SPM allocation algorithm. The former constructs a feasible schedule incrementally, aiming to minimize the number of preemptions in the feasible schedule. The latter allocates a portion of the SPM to each task in an efficient way by employing a novel data structure, namely, the preemption graph. We have evaluated our approach and a previous approach by using six task sets. The results show that our approach achieves up to 20.31% on WCRT (Worst-Case Response Time) reduction over the previous approach.

  • ASP-DAC - Scratchpad Memory aware task scheduling with minimum number of preemptions on a single processor
    2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), 2013
    Co-Authors: Qing Wan, Jingling Xue
    Abstract:

    We propose a unified approach to the problem of scheduling a set of tasks with individual release times, deadlines and precedence constraints, and allocating the data of each task to the SPM (Scratchpad Memory) on a single processor system. Our approach consists of a task scheduling algorithm and an SPM allocation algorithm. The former constructs a feasible schedule incrementally, aiming to minimize the number of preemptions in the feasible schedule. The latter allocates a portion of the SPM to each task in an efficient way by employing a novel data structure, namely, the preemption graph. We have evaluated our approach and a previous approach by using six task sets. The results show that our approach achieves up to 20.31% on WCRT (Worst-Case Response Time) reduction over the previous approach.

  • wcet aware data selection and allocation for Scratchpad Memory
    Languages Compilers and Tools for Embedded Systems, 2012
    Co-Authors: Qing Wan, Jingling Xue
    Abstract:

    In embedded systems, SPM (Scratchpad Memory) is an attractive alternative to cache Memory due to its lower energy consumption and higher predictability of program execution. This paper studies the problem of placing variables of a program into an SPM such that its WCET (worst-case execution time) is minimized. We propose an efficient dynamic approach that comprises two novel heuristics. The first heuristic iteratively selects a most beneficial variable as an SPM resident candidate based on its impact on the k longest paths of the program. The second heuristic incrementally allocates each SPM resident candidate to the SPM based on graph coloring and acyclic graph orientation. We have evaluated our approach by comparing with an ILP-based approach and a longest-path-based greedy approach using the eight benchmarks selected from Powerstone and Malardalen WCET Benchmark suites under three different SPM configurations. Our approach achieves up to 21% and 43% improvements in WCET reduction over the ILP-based approach and the greedy approach, respectively.

  • LCTES - WCET-aware data selection and allocation for Scratchpad Memory
    ACM SIGPLAN Notices, 2012
    Co-Authors: Qing Wan, Jingling Xue
    Abstract:

    In embedded systems, SPM (Scratchpad Memory) is an attractive alternative to cache Memory due to its lower energy consumption and higher predictability of program execution. This paper studies the problem of placing variables of a program into an SPM such that its WCET (worst-case execution time) is minimized. We propose an efficient dynamic approach that comprises two novel heuristics. The first heuristic iteratively selects a most beneficial variable as an SPM resident candidate based on its impact on the k longest paths of the program. The second heuristic incrementally allocates each SPM resident candidate to the SPM based on graph coloring and acyclic graph orientation. We have evaluated our approach by comparing with an ILP-based approach and a longest-path-based greedy approach using the eight benchmarks selected from Powerstone and Malardalen WCET Benchmark suites under three different SPM configurations. Our approach achieves up to 21% and 43% improvements in WCET reduction over the ILP-based approach and the greedy approach, respectively.

  • optimal wcet aware code selection for Scratchpad Memory
    Embedded Software, 2010
    Co-Authors: Jingling Xue, Sri Parameswaran
    Abstract:

    We propose the first polynomial-time code selection algorithm for minimising the worst-case execution time of a non-nested loop executed on a fully pipelined processor that uses Scratchpad Memory to replace the instruction cache. The time complexity of our algorithm is O(m(ne+n2 log n)), where n and e are the number of basic blocks and the number of edges in the control flow graph of the loop, and m is the size of the Scratchpad Memory. Furthermore, we propose the first dynamic code selection heuristic for minimising the worst-case execution time of a task by using our algorithm for a non-nested loop. Our simulation results show that our heuristic significantly outperforms a previously known heuristic

Jason Cong - One of the best experts on this subject based on the ideXlab platform.

  • designing Scratchpad Memory architecture with emerging stt ram Memory technologies
    International Symposium on Circuits and Systems, 2013
    Co-Authors: Peng Wang, Guangyu Sun, Tao Wang, Yuan Xie, Jason Cong
    Abstract:

    Scratchpad memories (SPMs) have been widely used in embedded systems to achieve comparable performance with better energy efficiency when compared to caches. Spin-transfer torque RAM (STT-RAM) is an emerging nonvolatile Memory technology that has low-power and high-density advantages over SRAM. In this study we explore and evaluate a series of Scratchpad Memory architectures consisting of STT-RAM. The experimental results reveal that with optimized design, STT-RAM is an effective alternative to SRAM for Scratchpad Memory in low-power embedded systems.

  • ISCAS - Designing Scratchpad Memory architecture with emerging STT-RAM Memory technologies
    2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
    Co-Authors: Peng Wang, Guangyu Sun, Tao Wang, Yuan Xie, Jason Cong
    Abstract:

    Scratchpad memories (SPMs) have been widely used in embedded systems to achieve comparable performance with better energy efficiency when compared to caches. Spin-transfer torque RAM (STT-RAM) is an emerging nonvolatile Memory technology that has low-power and high-density advantages over SRAM. In this study we explore and evaluate a series of Scratchpad Memory architectures consisting of STT-RAM. The experimental results reveal that with optimized design, STT-RAM is an effective alternative to SRAM for Scratchpad Memory in low-power embedded systems.

  • a reuse aware prefetching scheme for Scratchpad Memory
    Design Automation Conference, 2011
    Co-Authors: Jason Cong, Hui Huang, Chunyue Liu, Yi Zou
    Abstract:

    Scratchpad Memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide Memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements are 15.9% and 22.0% over cache prefetching, 12.9% and 31.2% over prefetch-only SPM management, 18.5% and 10% over DRDU [1] with SPM prefetching support.

  • DAC - A reuse-aware prefetching scheme for Scratchpad Memory
    Proceedings of the 48th Design Automation Conference on - DAC '11, 2011
    Co-Authors: Jason Cong, Hui Huang, Chunyue Liu, Yi Zou
    Abstract:

    Scratchpad Memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide Memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements are 15.9% and 22.0% over cache prefetching, 12.9% and 31.2% over prefetch-only SPM management, 18.5% and 10% over DRDU [1] with SPM prefetching support.

Sri Parameswaran - One of the best experts on this subject based on the ideXlab platform.

  • optimal wcet aware code selection for Scratchpad Memory
    Embedded Software, 2010
    Co-Authors: Jingling Xue, Sri Parameswaran
    Abstract:

    We propose the first polynomial-time code selection algorithm for minimising the worst-case execution time of a non-nested loop executed on a fully pipelined processor that uses Scratchpad Memory to replace the instruction cache. The time complexity of our algorithm is O(m(ne+n2 log n)), where n and e are the number of basic blocks and the number of edges in the control flow graph of the loop, and m is the size of the Scratchpad Memory. Furthermore, we propose the first dynamic code selection heuristic for minimising the worst-case execution time of a task by using our algorithm for a non-nested loop. Our simulation results show that our heuristic significantly outperforms a previously known heuristic

  • EMSOFT - Optimal WCET-aware code selection for Scratchpad Memory
    Proceedings of the tenth ACM international conference on Embedded software - EMSOFT '10, 2010
    Co-Authors: Jingling Xue, Sri Parameswaran
    Abstract:

    We propose the first polynomial-time code selection algorithm for minimising the worst-case execution time of a non-nested loop executed on a fully pipelined processor that uses Scratchpad Memory to replace the instruction cache. The time complexity of our algorithm is O(m(ne+n2 log n)), where n and e are the number of basic blocks and the number of edges in the control flow graph of the loop, and m is the size of the Scratchpad Memory. Furthermore, we propose the first dynamic code selection heuristic for minimising the worst-case execution time of a task by using our algorithm for a non-nested loop. Our simulation results show that our heuristic significantly outperforms a previously known heuristic

  • exploiting statistical information for implementation of instruction Scratchpad Memory in embedded system
    IEEE Transactions on Very Large Scale Integration Systems, 2006
    Co-Authors: Andhi Janapsatya, Aleksandar Ignjatovic, Sri Parameswaran
    Abstract:

    A method to both reduce energy and improve performance in a processor-based embedded system is described in this paper. Comprising of a Scratchpad Memory instead of an instruction cache, the target system dynamically (at runtime) copies into the Scratchpad code segments that are determined to be beneficial (in terms of energy efficiency and/or speed) to execute from the Scratchpad. We develop a heuristic algorithm to select such code segments based on a metric, called concomitance. Concomitance is derived from the temporal relationships of instructions. A hardware controller is designed and implemented for managing the Scratchpad Memory. Strategically placed custom instructions in the program inform the hardware controller when to copy instructions from the main Memory to the Scratchpad. A novel heuristic algorithm is implemented for determining locations within the program where to insert these custom instructions. For a set of realistic benchmarks, experimental results indicate the method uses 41.9% lower energy (on average) and improves performance by 40.0% (on average) when compared to a traditional cache system which is identical in size

  • a novel instruction Scratchpad Memory optimization method based on concomitance metric
    Asia and South Pacific Design Automation Conference, 2006
    Co-Authors: Andhi Janapsatya, Aleksandar Ignjatovic, Sri Parameswaran
    Abstract:

    Scratchpad Memory has been introduced as a replacement for cache Memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that Scratchpad Memory can significantly reduce the energy consumption of the Memory hierarchy of embedded systems. This is significant, as the Memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction Memory Scratchpad based on a methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the Scratchpad Memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our Scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art Scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the Scratchpad Memory from the main Memory is reduced by 88%.

  • ASP-DAC - A novel instruction Scratchpad Memory optimization method based on concomitance metric
    Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06, 2006
    Co-Authors: Andhi Janapsatya, Aleksandar Ignjatovic, Sri Parameswaran
    Abstract:

    Scratchpad Memory has been introduced as a replacement for cache Memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that Scratchpad Memory can significantly reduce the energy consumption of the Memory hierarchy of embedded systems. This is significant, as the Memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction Memory Scratchpad based on a methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the Scratchpad Memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our Scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art Scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the Scratchpad Memory from the main Memory is reduced by 88%.

Guy G.f. Lemieux - One of the best experts on this subject based on the ideXlab platform.

  • vegas soft vector processor with Scratchpad Memory
    Field Programmable Gate Arrays, 2011
    Co-Authors: Christopher Han-yu Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, Guy G.f. Lemieux
    Abstract:

    This paper presents VEGAS, a new soft vector architecture, in which the vector processor reads and writes directly to a Scratchpad Memory instead of a vector register file. The Scratchpad Memory is a more efficient storage medium than a vector register file, allowing up to 9x more data elements to fit into on-chip Memory. In addition, the use of fracturable ALUs in VEGAS allow efficient processing of bytes, halfwords and words in the same processor instance, providing up to 4x the operations compared to existing fixed-width soft vector ALUs. Benchmarks show the new VEGAS architecture is 10x to 208x faster than Nios II and has 1.7x to 3.1x better area-delay product than previous vector work, achieving much higher throughput per unit area. To put this performance in perspective, VEGAS is faster than a leading-edge Intel processor at integer matrix multiply. To ease programming effort and provide full debug support, VEGAS uses a C macro API that outputs vector instructions as standard NIOS II/f custom instructions.

  • FPGA - VEGAS: soft vector processor with Scratchpad Memory
    Proceedings of the 19th ACM SIGDA international symposium on Field programmable gate arrays - FPGA '11, 2011
    Co-Authors: Christopher Han-yu Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, Guy G.f. Lemieux
    Abstract:

    This paper presents VEGAS, a new soft vector architecture, in which the vector processor reads and writes directly to a Scratchpad Memory instead of a vector register file. The Scratchpad Memory is a more efficient storage medium than a vector register file, allowing up to 9x more data elements to fit into on-chip Memory. In addition, the use of fracturable ALUs in VEGAS allow efficient processing of bytes, halfwords and words in the same processor instance, providing up to 4x the operations compared to existing fixed-width soft vector ALUs. Benchmarks show the new VEGAS architecture is 10x to 208x faster than Nios II and has 1.7x to 3.1x better area-delay product than previous vector work, achieving much higher throughput per unit area. To put this performance in perspective, VEGAS is faster than a leading-edge Intel processor at integer matrix multiply. To ease programming effort and provide full debug support, VEGAS uses a C macro API that outputs vector instructions as standard NIOS II/f custom instructions.

Mingteng Cao - One of the best experts on this subject based on the ideXlab platform.

  • spmtm a novel Scratchpad Memory based hybrid nested transactional Memory framework
    Lecture Notes in Computer Science, 2009
    Co-Authors: Degui Feng, Tianzhou Chen, Guanjun Jiang, Tiefei Zhang, Mingteng Cao
    Abstract:

    Chip multiprocessor (CMP) has been the mainstream of processor design with the progress in semiconductor technology. It provides higher concurrency for the threads compared with the traditional single-core processor. Lock-based synchronization of multi-threads has been proved as an inefficient approach with high overhead. The previous works show that TM is an efficient solution to solve the synchronization of multi-threads. This paper presents SPMTM, a novel on-chip Memory based nested TM framework. The on-chip Memory used in this framework is not cache but Scratchpad Memory (SPM), which is software-controlled SRAM on chip. TM information will be stored in SPM to enhance the access speed and reduce the power consumption in SPMTM. Experimental results show that SPMTM can obtain average 16.3% performance improvement of the benchmarks compared with lock-based synchronization and with the increase in the number of processor core, the performance improvement is more significant.

  • APPT - SPMTM: A Novel Scratchpad Memory Based Hybrid Nested Transactional Memory Framework
    Lecture Notes in Computer Science, 2009
    Co-Authors: Degui Feng, Tianzhou Chen, Guanjun Jiang, Tiefei Zhang, Mingteng Cao
    Abstract:

    Chip multiprocessor (CMP) has been the mainstream of processor design with the progress in semiconductor technology. It provides higher concurrency for the threads compared with the traditional single-core processor. Lock-based synchronization of multi-threads has been proved as an inefficient approach with high overhead. The previous works show that TM is an efficient solution to solve the synchronization of multi-threads. This paper presents SPMTM, a novel on-chip Memory based nested TM framework. The on-chip Memory used in this framework is not cache but Scratchpad Memory (SPM), which is software-controlled SRAM on chip. TM information will be stored in SPM to enhance the access speed and reduce the power consumption in SPMTM. Experimental results show that SPMTM can obtain average 16.3% performance improvement of the benchmarks compared with lock-based synchronization and with the increase in the number of processor core, the performance improvement is more significant.