Cycle Count - Explore the Science & Experts

The Experts below are selected from a list of 3747 Experts worldwide ranked by ideXlab platform

Ren-song Tsay - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

2011 Design Automation & Test in Europe, 2011

Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay

Abstract:

Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

15 days free trial to Access Article
DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

2011 Design Automation & Test in Europe, 2011

Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay

Abstract:

Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

15 days free trial to Access Article
a Cycle Count accurate timing model for fast memory simulation

2010

Co-Authors: Yilen Lo, Li-chun Chen, Mao-lin Li, Ren-song Tsay

Abstract:

n this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

15 days free trial to Access Article

Chen-kang Lo - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

2011 Design Automation & Test in Europe, 2011

Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay

Abstract:

Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

15 days free trial to Access Article
DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

2011 Design Automation & Test in Europe, 2011

Co-Authors: Chen-kang Lo, Li-chun Chen, Meng-huan Wu, Ren-song Tsay

Abstract:

Ideally, system-level simulation should provide a high simulation speed with sufficient timing details for both functional verification and performance evaluation. However, existing Cycle-accurate (CA) and Cycle-approximate (CX) processor models either incur low simulation speeds due to excessive timing details or low accuracy due to simplified timing models. To achieve high simulation speeds while maintaining timing accuracy of the system simulation, we propose a first Cycle-Count-accurate (CCA) processor modeling approach which pre-abstracts internal pipeline and cache into models with accurate Cycle Count information and guarantees accurate timing and functional behaviors on processor interface. The experimental results show that the CCA model performs 50 times faster than the corresponding CA model while providing the same execution Cycle Count information as the target RTL model.

15 days free trial to Access Article
ASP-DAC - Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model

2009 Asia and South Pacific Design Automation Conference, 2009

Co-Authors: Chen-kang Lo, Ren-song Tsay

Abstract:

This paper proposes the first automatic approach to simultaneously generate Cycle Accurate and Cycle Count Accurate transaction level bus models. Since TLM (Transaction Level Modeling) is proven as an effective design methodology for managing the ever-increasing complexity of system level designs, researchers often exploit various abstraction levels to gain either simulation speed or accuracy. Consequently, designers repeatedly perform the time-consuming task of re-writing and performing consistency checks for different abstraction level models of the same design. To ease the work, we propose a correct-by-construction method that automatically and simultaneously generates both fast and accurate transaction level bus models for system simulation. The proposed approach relieves designers from the tedious and error-prone process of refining models and checking for consistency.

15 days free trial to Access Article

Mao-lin Li - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

2013 International Symposium onVLSI Design Automation and Test (VLSI-DAT), 2013

Co-Authors: Mao-lin Li, Chen-kang Lo, Li-chun Chen, Ren-song Tsay

Abstract:

This paper presents an effective Cycle-Count Accurate Transaction Level Modeling (CCA-TLM) and simulation technique for a point-to-point bus. We propose a two-phase bus arbitration model and an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experiment results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.

15 days free trial to Access Article
a Cycle Count accurate timing model for fast memory simulation

2010

Co-Authors: Yilen Lo, Li-chun Chen, Mao-lin Li, Ren-song Tsay

Abstract:

n this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

15 days free trial to Access Article
Cycle Count accurate memory modeling in system level design

International Conference on Hardware Software Codesign and System Synthesis, 2009

Co-Authors: Yilen Lo, Mao-lin Li, Ren-song Tsay

Abstract:

In this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

15 days free trial to Access Article
CODES+ISSS - Cycle Count accurate memory modeling in system level design

Proceedings of the 7th IEEE ACM international conference on Hardware software codesign and system synthesis - CODES+ISSS '09, 2009

Co-Authors: Yilen Lo, Mao-lin Li, Ren-song Tsay

Abstract:

In this paper, we propose an effective automatic generation approach for a Cycle-Count Accurate Memory Model (CCAMM) from the Clocked Finite State Machine (CFSM) of the Cycle Accurate Memory Model (CAMM). Since memory accesses are gradually dominating system activities, a correct and efficient memory timing model is essential to system-level simulation. In general, a CCAMM provides sufficient timing accuracy with low simulation overhead, and hence is preferred over the Simple Fixed Delay Model (SFDM), which has low accuracy, or the CAMM, which has low performance. Our proposed approach can systematically generate the CCAMM and guarantee correctness. The experimental results show that the generated model is as accurate as the Register Transfer Level (RTL) model while running 100X faster.

15 days free trial to Access Article

Tian-sheuan Chang - One of the best experts on this subject based on the ideXlab platform.

sifme a single iteration fractional pel motion estimation algorithm and architecture for hdtv sized h 264 video coding

International Conference on Acoustics Speech and Signal Processing, 2007

Co-Authors: Tian-sheuan Chang

Abstract:

This paper presents a set of fast algorithm and VLSI architecture for HDTV-sized H.264 fractional motion estimation. To solve the long computational latency in HD-sized application, we propose to use the single iteration algorithm with only six search points. This single iteration method halves the Cycle Count of two iteration methods in previous approaches. Moreover, we propose to use 4×4 Hadamard instead of 8×8 Hadamard as cost function for H.264 high profiles without significant video quality loss. By these techniques, the resulted architecture can save 20% of area and provide over 40% of throughput improvement than the previous work, and is able to support HDTV applications.

15 days free trial to Access Article
A zero-skipping multi-symbol CAVLC decoder for MPEG-4 AVC/H.264

2006 IEEE International Symposium on Circuits and Systems, 2006

Co-Authors: Guo-shiuan Yu, Tian-sheuan Chang

Abstract:

This paper presents a high-performance CAVLC decoding VLSI architecture for MPEG-4 AVC/H.264. Instead of just skipping zero block, the proposed design explores the features of CAVLC decoding process to efficient skip possible processes if none needed to be decoded, and can decode multiple symbols in sign and run before stage. The proposed design just needs average 90 Cycles for one MB decoding, which can meet real time HDTV requirement and saves 64% of Cycle Count in average when compared with previous design. The hardware cost is about 13192 gates when synthesized at 125 MHz

15 days free trial to Access Article
A Memory Bandwidth Optimized Interpolator for Motion Compensation in the H.264 Video Decoding

APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006

Co-Authors: Tian-sheuan Chang

Abstract:

The paper presents an interpolator design for motion compensation used in the H.264 video decoding. The presented design is optimized according to the available data bandwidth to avoid the idle hardware. In addition, the required memory access is further reduced by the interpolation window optimization. The implementation shows that the presented design can save about 10 % of silicon area or at least seven interpolation filters than that in the previous works. Besides, 12.5% to 71.3% of Cycle Count of motion compensation can be reduced by the interpolator window optimization. Finally, our architecture can be easily adjusted under different memory bandwidth

15 days free trial to Access Article
High Performance Context Adaptive Variable Length Coding Encoder for MPEG-4 AVC/H.264 Video Coding

APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006

Co-Authors: Min-chi Tsai, Tian-sheuan Chang

Abstract:

This paper presents a high-performance VLSI architecture for context adaptive variable length coding (CAVLC) used in the MPEG-4 AVC/H.264 video coding. Instead of only the coarse-grained 8times8 zero block skipping in the previous design, the proposed design implements the fine-grained zero skipping at the 4times4 block level and the individual coefficient level. The implementation with 0.18mum CMOS process just needs average 6.88 Cycles for one block coding and costs 11.9K gates when working at 100 MHz. This design saves more than half of Cycle Count and 48% of area cost when compared with the other designs

15 days free trial to Access Article

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

PACT - Studying Compiler-Microarchitecture Interactions through Interval Analysis

2007

Co-Authors: Stijn Eyerman, Lieven Eeckhout, James E. Smith

Abstract:

In practice, the only way that the performance gain (or loss) for a given compiler optimization can be determined is by running optimized programs on the hardware and timing them. This method, while useful, does not provide insight regarding the underlying causes for performance gain/loss. By using the recently proposed method of interval analysis, one can decompose total execution time into intuitively meaningful Cycle components. These components include a base Cycle Count, which is a measure of the time required to execute the program in the absence of all disruptive miss events, along with additional Cycle Counts for each type of miss event. Performance gain (or loss) resulting from a compiler optimization can then be attributed to either the base Cycle Count or to specific miss event(s).

15 days free trial to Access Article
Studying Compiler-Microarchitecture Interactions through Interval Analysis

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

Co-Authors: Stijn Eyerman, Lieven Eeckhout, James E. Smith

Abstract:

In practice, the only way that the performance gain (or loss) for a given compiler optimization can be determined is by running optimized programs on the hardware and timing them. This method, while useful, does not provide insight regarding the underlying causes for performance gain/loss. By using the recently proposed method of interval analysis, one can decompose total execution time into intuitively meaningful Cycle components. These components include a base Cycle Count, which is a measure of the time required to execute the program in the absence of all disruptive miss events, along with additional Cycle Counts for each type of miss event. Performance gain (or loss) resulting from a compiler optimization can then be attributed to either the base Cycle Count or to specific miss event(s).

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Cycle Count with ideXlab!

Ren-song Tsay - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

a Cycle Count accurate timing model for fast memory simulation

Chen-kang Lo - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

DATE - Cycle-Count-accurate processor modeling for fast and accurate system-level simulation

ASP-DAC - Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model

Mao-lin Li - One of the best experts on this subject based on the ideXlab platform.

A Cycle Count Accurate TLM bus modeling approach

VLSI-DAT - A Cycle Count Accurate TLM bus modeling approach

a Cycle Count accurate timing model for fast memory simulation

Cycle Count accurate memory modeling in system level design

CODES+ISSS - Cycle Count accurate memory modeling in system level design

Tian-sheuan Chang - One of the best experts on this subject based on the ideXlab platform.

sifme a single iteration fractional pel motion estimation algorithm and architecture for hdtv sized h 264 video coding

A zero-skipping multi-symbol CAVLC decoder for MPEG-4 AVC/H.264

A Memory Bandwidth Optimized Interpolator for Motion Compensation in the H.264 Video Decoding

High Performance Context Adaptive Variable Length Coding Encoder for MPEG-4 AVC/H.264 Video Coding

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

PACT - Studying Compiler-Microarchitecture Interactions through Interval Analysis

Studying Compiler-Microarchitecture Interactions through Interval Analysis