The Experts below are selected from a list of 15687 Experts worldwide ranked by ideXlab platform
Qingguang Huang - One of the best experts on this subject based on the ideXlab platform.
-
enabling loop fusion and tiling for cache performance by fixing fusion preventing data dependences
International Conference on Parallel Processing, 2005Co-Authors: Qingguang HuangAbstract:This paper presents a new approach to enabling loop fusion and tiling for arbitrary affine loop nests. Given a set of multiple loop nests, we present techniques that automatically eliminate all the fusion-preventing dependences by means of loop tiling and array copying. Applying our techniques iteratively to multiple loop nests yields a single loop nest that can be tiled for cache locality. Our approach handles LU, QR, Cholesky and Jacobi in a unified framework. Our experimental evaluation on an SGI Octane2 system shows that the benefit from the significantly reduced L1 and L2 cache misses has far more than offset the branching and loop control overhead introduced by our approach.
-
ICPP - Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences
2005 International Conference on Parallel Processing (ICPP'05), 2005Co-Authors: Qingguang HuangAbstract:This paper presents a new approach to enabling loop fusion and tiling for arbitrary affine loop nests. Given a set of multiple loop nests, we present techniques that automatically eliminate all the fusion-preventing dependences by means of loop tiling and array copying. Applying our techniques iteratively to multiple loop nests yields a single loop nest that can be tiled for cache locality. Our approach handles LU, QR, Cholesky and Jacobi in a unified framework. Our experimental evaluation on an SGI Octane2 system shows that the benefit from the significantly reduced L1 and L2 cache misses has far more than offset the branching and loop control overhead introduced by our approach.
Chun-wei Gu - One of the best experts on this subject based on the ideXlab platform.
-
Parametric analysis of a dual loop Organic Rankine Cycle (ORC) system for engine waste heat recovery
Energy Conversion and Management, 2015Co-Authors: Jian Song, Chun-wei GuAbstract:This paper presents a dual loop Organic Rankine Cycle (ORC) system consisting of a high temperature (HT) loop and a low temperature (LT) loop for engine waste heat recovery. The HT loop recovers the waste heat of the engine exhaust gas, and the LT loop recovers that of the jacket cooling water in addition to the residual heat of the HT loop. The two loops are coupled via a shared heat exchanger, which means that the condenser of the HT loop is the evaporator of the LT loop as well. Cyclohexane, benzene and toluene are selected as the working fluids of the HT loop. Different condensation temperatures of the HT loop are set to maintain the condensation pressure slightly higher than the atmosphere pressure. R123, R236fa and R245fa are chosen for the LT loop. Parametric analysis is conducted to evaluate the influence of the HT loop condensation temperature and the residual heat load on the LT loop. The simulation results reveal that under different condensation conditions of the HT loop, the pinch point of the LT loop appears at different locations, resulting in different evaporation temperatures and other thermal parameters. With cyclohexane for the HT loop and R245fa for the LT loop, the maximum net power output of the dual loop ORC system reaches 111.2 kW. Since the original power output of the engine is 996 kW, the additional power generated by the dual loop ORC system can increase the engine power by 11.2%.
Sithamparanathan Kandeepan - One of the best experts on this subject based on the ideXlab platform.
-
steady state distribution of a hyperbolic digital tanlock loop with extended pull in range for frequency synchronization in high doppler environment
IEEE Transactions on Wireless Communications, 2009Co-Authors: Sithamparanathan KandeepanAbstract:A hyperbolic arctan based digital tanlock loop (D-TLL) operating with complex signals at base-band or intermediate frequencies in high Doppler environments is treated here. The arctan based loop, known as the tanlock loop (TLL), is used in software defined radio architectures for frequency acquisition and tracking. The hyperbolic nonlinearity intentionally introduced within the phase detector extends the pull-in range of the frequency for a given loop, compared to the normal D-TLL, allowing a wider frequency acquisition range which is suitable for high Doppler communications environment. In this paper we study the steady state phase noise performances of such a feedback loop for additive Gaussian noise using stochastic analysis. The stochastic model of a first-order hyperbolic loop and the theoretical analysis for the corresponding statistical distribution of the closed loop steady state phase noise are presented. The theoretical results are also verified by simulations.
-
acquisition performance of a digital phase locked loop with a four quadrant arctan phase detector
International Symposium on Intelligent Signal Processing and Communication Systems, 2004Co-Authors: Sithamparanathan Kandeepan, Sam ReisenfeldAbstract:The acquisition performance of a digital phase locked loop (DPLL) with a four-quadrant arctan based phase detector (PD) is discussed. In the noiseless case, unlike the traditional sine function based phase locked loops, the acquisition process of the four-quadrant arctan based phase locked loops is less tedious. We look into the pull-in process together with a time-series analysis of the DPLL for the noiseless case. The phase-plane portrait of the loop is also discussed, for both the noiseless and the noisy conditions.
Qingfeng Zhuge - One of the best experts on this subject based on the ideXlab platform.
-
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems, 2011Co-Authors: Qingfeng Zhuge, Yi HeAbstract:In this paper, a technique that combines loop distribution with maximum direct loop fusion (LD_MDF) is proposed. The technique performs maximum loop distribution , followed by maximum direct loop fusion to optimize timing and code size simultaneously. The loop distribution theorems that state the conditions distributing any multi-level nested loop in the maximum way are proved. It is proved that the statements involved in the dependence cycle can be fully distributed if the summation of the edge weight of the dependence cycle satisfies a certain condition; otherwise, the statements should be put in the same loop after loop distribution. Based on the loop distribution theorems, algorithms are designed to conduct maximum loop distribution. The maximum direct loop fusion problem is mapped to the graph partitioning problem. A polynomial graph partitioning algorithm is developed to compute the fusion partitions. It is proved that the proposed maximum direct loop fusion algorithm produces the fewest number of resultant loop nests without violating dependence constraints. It is also shown that the resultant code size of the fused loops by the technique of loop distribution with maximum direct loop fusion is smaller than the code size of the original loops when the number of fused loops is less than the number of the original loops. The simulation results are presented to validate the proposed technique.
-
ISPAN - Maximum loop distribution and fusion for two-level loops considering code size
8th International Symposium on Parallel Architectures Algorithms and Networks (ISPAN'05), 2005Co-Authors: Qingfeng Zhuge, Zili ShaoAbstract:In this paper, we propose a technique combining loop distribution with loop fusion to improve the timing performance without increasing the code size of the transformed loops. We first develop the loop distribution theorems that state the conditions distributing any two-level nested loop in the maximum way. Based on the loop distribution theorems, we design an algorithm to conduct maximum loop distribution. Then we propose a technique of maximum loop distribution with direct loop fusion, which performs maximum loop distribution followed by direct loop fusion. The experimental results show that the execution time of the transformed loops by our technique is reduced 41.9% on average compared to the original loops without the increase of the code size.
-
loop distribution and fusion with timing and code size optimization for embedded dsps
Embedded and Ubiquitous Computing, 2005Co-Authors: Qingfeng Zhuge, Zili ShaoAbstract:Loop distribution and loop fusion are two e.ective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. We .rst develop the loop distribution theorems that state the legality conditions of loop distribution for multi-level nested loops. We show that if the summation of the edge weights of the dependence cycle satis.es a certain condition, then the statements involved in the dependence cycle can be distributed; otherwise, they should be put in the same loop after loop distribution. Then, we propose the technique of maximum loop distribution with direct loop fusion. The experimental results show that the execution time of the transformed loops by our technique is reduced 21.0compared to the original loops and the code size of the transformed loops is reduced 7.0% on average compared to the original loops.
-
EUC - Loop distribution and fusion with timing and code size optimization for embedded DSPs
Embedded and Ubiquitous Computing – EUC 2005, 2005Co-Authors: Qingfeng Zhuge, Zili ShaoAbstract:Loop distribution and loop fusion are two e.ective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. We .rst develop the loop distribution theorems that state the legality conditions of loop distribution for multi-level nested loops. We show that if the summation of the edge weights of the dependence cycle satis.es a certain condition, then the statements involved in the dependence cycle can be distributed; otherwise, they should be put in the same loop after loop distribution. Then, we propose the technique of maximum loop distribution with direct loop fusion. The experimental results show that the execution time of the transformed loops by our technique is reduced 21.0compared to the original loops and the code size of the transformed loops is reduced 7.0% on average compared to the original loops.
-
ISSS - Scheduling and partitioning for multiple loop nests
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01, 2001Co-Authors: Zhong Wang, Qingfeng ZhugeAbstract:This paper presents the multiple loop partition scheduling technique, which combines the loop partition and prefetching. It can exploit the data locality better than the traditional loop partition, which only focus on a singleton nested loop, and loop fusion. Moreover, multiple loop partition scheduling balances the computation and memory loading, such that the long memory latency can be hidden effectively. The experiments shows that multiple loop partition scheduling can achieve the significant improvement over the existed methods.
Jian Song - One of the best experts on this subject based on the ideXlab platform.
-
performance analysis of a dual loop organic rankine cycle orc system with wet steam expansion for engine waste heat recovery
Applied Energy, 2015Co-Authors: Jian Song, C GuAbstract:A dual-loop organic Rankine cycle (ORC) system is designed to recover the waste heat of a diesel engine. The high-temperature (HT) loop utilizes the heat load of the engine exhaust gas, and the low-temperature (LT) loop uses the heat load of the jacket cooling water and the residual heat of the HT loop sequentially. These two loops are coupled via a shared heat exchanger. Water is selected as the working fluid for the HT loop and wet steam expansion, which can be implemented through screw expanders, is exploited. The dryness fraction of the wet steam at the inlet of the expander can be adjusted to attain a suitable evaporation temperature and provide a better temperature match with the heat source. The working fluid candidates for the LT loop are chosen to be R123, R236fa and R245fa. The influence of the HT loop parameters on the performance of the LT loop is evaluated. The simulation results reveal that under different operating conditions of the HT loop, the pinch point of the LT loop occurs at different locations and therefore, results in different evaporation temperatures and other thermal parameters. The maximum net power output of the dual-loop ORC system reaches 115.1kW, which leads to an increase of 11.6% on the original power output of the diesel engine.
-
Parametric analysis of a dual loop Organic Rankine Cycle (ORC) system for engine waste heat recovery
Energy Conversion and Management, 2015Co-Authors: Jian Song, Chun-wei GuAbstract:This paper presents a dual loop Organic Rankine Cycle (ORC) system consisting of a high temperature (HT) loop and a low temperature (LT) loop for engine waste heat recovery. The HT loop recovers the waste heat of the engine exhaust gas, and the LT loop recovers that of the jacket cooling water in addition to the residual heat of the HT loop. The two loops are coupled via a shared heat exchanger, which means that the condenser of the HT loop is the evaporator of the LT loop as well. Cyclohexane, benzene and toluene are selected as the working fluids of the HT loop. Different condensation temperatures of the HT loop are set to maintain the condensation pressure slightly higher than the atmosphere pressure. R123, R236fa and R245fa are chosen for the LT loop. Parametric analysis is conducted to evaluate the influence of the HT loop condensation temperature and the residual heat load on the LT loop. The simulation results reveal that under different condensation conditions of the HT loop, the pinch point of the LT loop appears at different locations, resulting in different evaporation temperatures and other thermal parameters. With cyclohexane for the HT loop and R245fa for the LT loop, the maximum net power output of the dual loop ORC system reaches 111.2 kW. Since the original power output of the engine is 996 kW, the additional power generated by the dual loop ORC system can increase the engine power by 11.2%.