Kernel Invocation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 72 Experts worldwide ranked by ideXlab platform

Vibha Patel - One of the best experts on this subject based on the ideXlab platform.

  • a shared memory based implementation of needleman wunsch algorithm using skewing transformation
    International Journal of Advanced Research in Computer Science, 2017
    Co-Authors: Vibha Patel, Krunal Gandhi, Darshak Bhatti
    Abstract:

    Among various algorithms for protein and nucleotide alignment, Needleman-Wunsch algorithm is widely accepted as it can divide the problem into sub-problems. We present two parallel approaches of the Needleman-Wunsch algorithm with the single Kernel and multi-Kernel Invocation using skewing transformation which is used for traversing and calculation of dynamic programming matrix. We also compare these with traditional CPU sequential approach which resulted in six-fold performance improvement. Furthermore, we present same single Kernel ideology on shared memory which resulted in two-fold performance improvement our non-shared memory approach.

  • a gpu based implementation of needleman wunsch algorithm using skewing transformation
    International Conference on Contemporary Computing, 2015
    Co-Authors: Anuj Chaudhary, Deepkumar Kagathara, Vibha Patel
    Abstract:

    We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-Kernel Invocation with lock-free block synchronization and multi-Kernel Invocation at block-synchronization points. Both the GPU based implementations gave upto 6 times performance improvement over the sequential CPU based implementation.

  • IC3 - A GPU based implementation of Needleman-Wunsch algorithm using skewing transformation
    2015 Eighth International Conference on Contemporary Computing (IC3), 2015
    Co-Authors: Anuj Chaudhary, Deepkumar Kagathara, Vibha Patel
    Abstract:

    We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-Kernel Invocation with lock-free block synchronization and multi-Kernel Invocation at block-synchronization points. Both the GPU based implementations gave upto 6 times performance improvement over the sequential CPU based implementation.

Zebo Peng - One of the best experts on this subject based on the ideXlab platform.

  • Latency-aware packet processing on CPU-GPU heterogeneous systems
    2017 54th ACM EDAC IEEE Design Automation Conference (DAC), 2017
    Co-Authors: Arian Maghazeh, Unmesh D. Bordoloi, Usman Dastgeer, Alexandru Andrei, Petru Eles, Zebo Peng
    Abstract:

    In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing applications can, potentially, be satisfied by these throughput-oriented machines. However, another important aspect of such applications is the packet latency which, if not handled carefully, will overshadow the throughput benefits. Unfortunately, until now, this aspect has been mostly ignored. To address this issue, we propose a method that considers the variable bit rate of the traffic and, depending on the current rate, minimizes the latency, while meeting the rate demand. We propose a persistent Kernel based software architecture to overcome the challenges inherent in GPU implementation like Kernel Invocation overhead, CPU-GPU communication and memory access overhead. We have chosen packet classification as the packet processing application to demonstrate our technique. Using the proposed approach, we are able to reduce the packet latency on average by a factor of 3.5, compared to the state-of-the-art solutions, without any packet drop.

  • DAC - Latency-Aware Packet Processing on CPU-GPU Heterogeneous Systems
    Proceedings of the 54th Annual Design Automation Conference 2017, 2017
    Co-Authors: Arian Maghazeh, Unmesh D. Bordoloi, Usman Dastgeer, Alexandru Andrei, Petru Eles, Zebo Peng
    Abstract:

    In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing applications can, potentially, be satisfied by these throughput-oriented machines. However, another important aspect of such applications is the packet latency which, if not handled carefully, will overshadow the throughput benefits. Unfortunately, until now, this aspect has been mostly ignored. To address this issue, we propose a method that considers the variable bit rate of the traffic and, depending on the current rate, minimizes the latency, while meeting the rate demand. We propose a persistent Kernel based software architecture to overcome the challenges inherent in GPU implementation like Kernel Invocation overhead, CPU-GPU communication and memory access overhead. We have chosen packet classification as the packet processing application to demonstrate our technique. Using the proposed approach, we are able to reduce the packet latency on average by a factor of 3.5, compared to the state-of-the-art solutions, without any packet drop.

Bin Gong - One of the best experts on this subject based on the ideXlab platform.

  • PAAP - Option Pricing on the GPU with Backward Stochastic Differential Equation
    2011 Fourth International Symposium on Parallel Architectures Algorithms and Programming, 2011
    Co-Authors: Ying Peng, Bin Gong
    Abstract:

    In this paper, we develop acceleration strategies for option pricing with non-linear Backward Stochastic Differential Equation (BSDE), which appears as a robust and valuable tool in financial markets. An efficient binomial lattice based method is adopted to solve the BSDE numerically. In order to reduce the global memory access frequency, the Kernel Invocation is avoided to be performed on each time step. Furthermore, for evaluating the affect of load balance to the performance, we provide two different acceleration strategies and compare them with running time experiments. The acceleration algorithms exhibit tremendous speedup over the sequential CPU implementation and therefore suitable for real-time application.

  • Option Pricing on the GPU with Backward Stochastic Differential Equation
    2011 Fourth International Symposium on Parallel Architectures Algorithms and Programming, 2011
    Co-Authors: Ying Peng, Bin Gong
    Abstract:

    In this paper, we develop acceleration strategies for option pricing with non-linear Backward Stochastic Differential Equation (BSDE), which appears as a robust and valuable tool in financial markets. An efficient binomial lattice based method is adopted to solve the BSDE numerically. In order to reduce the global memory access frequency, the Kernel Invocation is avoided to be performed on each time step. Furthermore, for evaluating the affect of load balance to the performance, we provide two different acceleration strategies and compare them with running time experiments. The acceleration algorithms exhibit tremendous speedup over the sequential CPU implementation and therefore suitable for real-time application.

Anuj Chaudhary - One of the best experts on this subject based on the ideXlab platform.

  • a gpu based implementation of needleman wunsch algorithm using skewing transformation
    International Conference on Contemporary Computing, 2015
    Co-Authors: Anuj Chaudhary, Deepkumar Kagathara, Vibha Patel
    Abstract:

    We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-Kernel Invocation with lock-free block synchronization and multi-Kernel Invocation at block-synchronization points. Both the GPU based implementations gave upto 6 times performance improvement over the sequential CPU based implementation.

  • IC3 - A GPU based implementation of Needleman-Wunsch algorithm using skewing transformation
    2015 Eighth International Conference on Contemporary Computing (IC3), 2015
    Co-Authors: Anuj Chaudhary, Deepkumar Kagathara, Vibha Patel
    Abstract:

    We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-Kernel Invocation with lock-free block synchronization and multi-Kernel Invocation at block-synchronization points. Both the GPU based implementations gave upto 6 times performance improvement over the sequential CPU based implementation.

Darshak Bhatti - One of the best experts on this subject based on the ideXlab platform.

  • a shared memory based implementation of needleman wunsch algorithm using skewing transformation
    International Journal of Advanced Research in Computer Science, 2017
    Co-Authors: Vibha Patel, Krunal Gandhi, Darshak Bhatti
    Abstract:

    Among various algorithms for protein and nucleotide alignment, Needleman-Wunsch algorithm is widely accepted as it can divide the problem into sub-problems. We present two parallel approaches of the Needleman-Wunsch algorithm with the single Kernel and multi-Kernel Invocation using skewing transformation which is used for traversing and calculation of dynamic programming matrix. We also compare these with traditional CPU sequential approach which resulted in six-fold performance improvement. Furthermore, we present same single Kernel ideology on shared memory which resulted in two-fold performance improvement our non-shared memory approach.