Cycle Count

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3747 Experts worldwide ranked by ideXlab platform

Ren-song Tsay - One of the best experts on this subject based on the ideXlab platform.

  • A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • Cycle-Count-accurate processor modeling for fast and accurate system-level simulation
    2011 Design Automation & Test in Europe, 2011
    Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay
    Abstract:

    Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

  • DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation
    2011 Design Automation & Test in Europe, 2011
    Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay
    Abstract:

    Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

  • a Cycle Count accurate timing model for fast memory simulation
    2010
    Co-Authors: Yilen Lo, Li-chun Chen, Mao-lin Li, Ren-song Tsay
    Abstract:

    n this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

Chen-kang Lo - One of the best experts on this subject based on the ideXlab platform.

  • A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • Cycle-Count-accurate processor modeling for fast and accurate system-level simulation
    2011 Design Automation & Test in Europe, 2011
    Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay
    Abstract:

    Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

  • DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation
    2011 Design Automation & Test in Europe, 2011
    Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay
    Abstract:

    Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

  • ASP-DAC - Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model
    2009 Asia and South Pacific Design Automation Conference, 2009
    Co-Authors: Chen-kang Lo, Ren-song Tsay
    Abstract:

    This paper proposes the first automatic approach to simultaneously generate Cycle Accurate and Cycle Count Accurate transaction level bus models. Since TLM (Transaction Level Modeling) is proven as an effective design methodology for managing the ever-increasing complexity of system level designs, researchers often exploit various abstraction levels to gain either simulation speed or accuracy. Consequently, designers repeatedly perform the time-consuming task of re-writing and performing consistency checks for different abstraction level models of the same design. To ease the work, we propose a correct-by-construction method that automatically and simultaneously generates both fast and accurate transaction level bus models for system simulation. The proposed approach relieves designers from the tedious and error-prone process of refining models and checking for consistency.

Mao-lin Li - One of the best experts on this subject based on the ideXlab platform.

  • A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach
    2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013
    Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay
    Abstract:

    This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

  • a Cycle Count accurate timing model for fast memory simulation
    2010
    Co-Authors: Yilen Lo, Li-chun Chen, Mao-lin Li, Ren-song Tsay
    Abstract:

    n this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

  • Cycle Count accurate memory modeling in system level design
    International Conference on Hardware Software Codesign and System Synthesis, 2009
    Co-Authors: Yilen Lo, Mao-lin Li, Ren-song Tsay
    Abstract:

    In this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

  • CODES+ISSS - Cycle Count accurate memory modeling in system level design
    Proceedings of the 7th IEEE ACM international conference on Hardware software codesign and system synthesis - CODES+ISSS '09, 2009
    Co-Authors: Yilen Lo, Mao-lin Li, Ren-song Tsay
    Abstract:

    In this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

Tian-sheuan Chang - One of the best experts on this subject based on the ideXlab platform.

  • sifme a single iteration fractional pel motion estimation algorithm and architecture for hdtv sized h 264 video coding
    International Conference on Acoustics Speech and Signal Processing, 2007
    Co-Authors: Tian-sheuan Chang
    Abstract:

    This paper presents a set of fast algorithm and VLSI architecture for HDTV-sized H.264 fractional motion estimation. To solve the long computational latency in HD-sized application, we propose to use the single iteration algorithm with only six search points. This single iteration method halves the Cycle Count of two iteration methods in previous approaches. Moreover, we propose to use 4×4 Hadamard instead of 8×8 Hadamard as cost function for H.264 high profiles without significant video quality loss. By these techniques, the resulted architecture can save 20% of area and provide over 40% of throughput improvement than the previous work, and is able to support HDTV applications.

  • A zero-skipping multi-symbol CAVLC decoder for MPEG-4 AVC/H.264
    2006 IEEE International Symposium on Circuits and Systems, 2006
    Co-Authors: Guo-shiuan Yu, Tian-sheuan Chang
    Abstract:

    This paper presents a high-performance CAVLC decoding VLSI architecture for MPEG-4 AVC/H.264. Instead of just skipping zero block, the proposed design explores the features of CAVLC decoding process to efficient skip possible processes if none needed to be decoded, and can decode multiple symbols in sign and run before stage. The proposed design just needs average 90 Cycles for one MB decoding, which can meet real time HDTV requirement and saves 64% of Cycle Count in average when compared with previous design. The hardware cost is about 13192 gates when synthesized at 125 MHz

  • A Memory Bandwidth Optimized Interpolator for Motion Compensation in the H.264 Video Decoding
    APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006
    Co-Authors: Tian-sheuan Chang
    Abstract:

    The paper presents an interpolator design for motion compensation used in the H.264 video decoding. The presented design is optimized according to the available data bandwidth to avoid the idle hardware. In addition, the required memory access is further reduced by the interpolation window optimization. The implementation shows that the presented design can save about 10 % of silicon area or at least seven interpolation filters than that in the previous works. Besides, 12.5% to 71.3% of Cycle Count of motion compensation can be reduced by the interpolator window optimization. Finally, our architecture can be easily adjusted under different memory bandwidth

  • High Performance Context Adaptive Variable Length Coding Encoder for MPEG-4 AVC/H.264 Video Coding
    APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006
    Co-Authors: Min-chi Tsai, Tian-sheuan Chang
    Abstract:

    This paper presents a high-performance VLSI architecture for context adaptive variable length coding (CAVLC) used in the MPEG-4 AVC/H.264 video coding. Instead of only the coarse-grained 8times8 zero block skipping in the previous design, the proposed design implements the fine-grained zero skipping at the 4times4 block level and the individual coefficient level. The implementation with 0.18mum CMOS process just needs average 6.88 Cycles for one block coding and costs 11.9K gates when working at 100 MHz. This design saves more than half of Cycle Count and 48% of area cost when compared with the other designs

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

  • PACT - Studying Compiler-Microarchitecture Interactions through Interval Analysis
    2007
    Co-Authors: Stijn Eyerman, Lieven Eeckhout, James E. Smith
    Abstract:

    In practice, the only way that the performance gain (or loss) for a given compiler optimization can be determined is by running optimized programs on the hardware and timing them. This method, while useful, does not provide insight regarding the underlying causes for performance gain/loss. By using the recently proposed method of interval analysis, one can decompose total execution time into intuitively meaningful Cycle components. These components include a base Cycle Count, which is a measure of the time required to execute the program in the absence of all disruptive miss events, along with additional Cycle Counts for each type of miss event. Performance gain (or loss) resulting from a compiler optimization can then be attributed to either the base Cycle Count or to specific miss event(s).

  • Studying Compiler-Microarchitecture Interactions through Interval Analysis
    16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007
    Co-Authors: Stijn Eyerman, Lieven Eeckhout, James E. Smith
    Abstract:

    In practice, the only way that the performance gain (or loss) for a given compiler optimization can be determined is by running optimized programs on the hardware and timing them. This method, while useful, does not provide insight regarding the underlying causes for performance gain/loss. By using the recently proposed method of interval analysis, one can decompose total execution time into intuitively meaningful Cycle components. These components include a base Cycle Count, which is a measure of the time required to execute the program in the absence of all disruptive miss events, along with additional Cycle Counts for each type of miss event. Performance gain (or loss) resulting from a compiler optimization can then be attributed to either the base Cycle Count or to specific miss event(s).