Serial Program

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 14829 Experts worldwide ranked by ideXlab platform

Wlodzimierz Bielecki - One of the best experts on this subject based on the ideXlab platform.

  • Tiling Nussinov’s RNA folding loop nest with a space-time approach
    BMC Bioinformatics, 2019
    Co-Authors: Marek Palkowski, Wlodzimierz Bielecki
    Abstract:

    Background An RNA primary structure, or sequence, is a single strand considered as a chain of nucleotides from the alphabet AUGC (adenine, uracil, guanine, cytosine). The strand can be folded onto itself, i.e., one segment of an RNA sequence might be paired with another segment of the same RNA sequence into a two-dimensional structure composed by a list of complementary base pairs, which are close together with the minimum energy. That list is called RNA’s secondary structure and is predicted by an RNA folding algorithm. RNA secondary structure prediction is a computing-intensive task that lies at the core of search applications in bioinformatics. Results We suggest a space-time tiling approach and apply it to generate parallel cache effective tiled code for RNA folding using Nussinov’s algorithm. Conclusions Parallel tiled code generated with a suggested space-time loop tiling approach outperforms known related codes generated automatically by means of optimizing compilers and codes produced manually. The presented approach enables us to tile all the three loops of Nussinov’s recurrence that is not possible with commonly known tiling techniques. Generated parallel tiled code is scalable regarding to the number of parallel threads – increasing the number of threads reduces code execution time. Defining speed up as the ratio of the time taken to run the original Serial Program on one thread to the time taken to run the tiled Program on P threads, we achieve super-linear speed up (a value of speed up is greater than the number of threads used) for parallel tiled code against the original Serial code up to 32 threads and super-linear speed up scalability (increasing speed up with increasing the thread number) up to 8 threads. For one thread used, speed up is about 4.2 achieved on an Intel Xeon machine used for carrying out experiments.

  • tiling nussinov s rna folding loop nest with a space time approach
    BMC Bioinformatics, 2019
    Co-Authors: Marek Palkowski, Wlodzimierz Bielecki
    Abstract:

    An RNA primary structure, or sequence, is a single strand considered as a chain of nucleotides from the alphabet AUGC (adenine, uracil, guanine, cytosine). The strand can be folded onto itself, i.e., one segment of an RNA sequence might be paired with another segment of the same RNA sequence into a two-dimensional structure composed by a list of complementary base pairs, which are close together with the minimum energy. That list is called RNA’s secondary structure and is predicted by an RNA folding algorithm. RNA secondary structure prediction is a computing-intensive task that lies at the core of search applications in bioinformatics. We suggest a space-time tiling approach and apply it to generate parallel cache effective tiled code for RNA folding using Nussinov’s algorithm. Parallel tiled code generated with a suggested space-time loop tiling approach outperforms known related codes generated automatically by means of optimizing compilers and codes produced manually. The presented approach enables us to tile all the three loops of Nussinov’s recurrence that is not possible with commonly known tiling techniques. Generated parallel tiled code is scalable regarding to the number of parallel threads – increasing the number of threads reduces code execution time. Defining speed up as the ratio of the time taken to run the original Serial Program on one thread to the time taken to run the tiled Program on P threads, we achieve super-linear speed up (a value of speed up is greater than the number of threads used) for parallel tiled code against the original Serial code up to 32 threads and super-linear speed up scalability (increasing speed up with increasing the thread number) up to 8 threads. For one thread used, speed up is about 4.2 achieved on an Intel Xeon machine used for carrying out experiments.

Marek Palkowski - One of the best experts on this subject based on the ideXlab platform.

  • Tiling Nussinov’s RNA folding loop nest with a space-time approach
    BMC Bioinformatics, 2019
    Co-Authors: Marek Palkowski, Wlodzimierz Bielecki
    Abstract:

    Background An RNA primary structure, or sequence, is a single strand considered as a chain of nucleotides from the alphabet AUGC (adenine, uracil, guanine, cytosine). The strand can be folded onto itself, i.e., one segment of an RNA sequence might be paired with another segment of the same RNA sequence into a two-dimensional structure composed by a list of complementary base pairs, which are close together with the minimum energy. That list is called RNA’s secondary structure and is predicted by an RNA folding algorithm. RNA secondary structure prediction is a computing-intensive task that lies at the core of search applications in bioinformatics. Results We suggest a space-time tiling approach and apply it to generate parallel cache effective tiled code for RNA folding using Nussinov’s algorithm. Conclusions Parallel tiled code generated with a suggested space-time loop tiling approach outperforms known related codes generated automatically by means of optimizing compilers and codes produced manually. The presented approach enables us to tile all the three loops of Nussinov’s recurrence that is not possible with commonly known tiling techniques. Generated parallel tiled code is scalable regarding to the number of parallel threads – increasing the number of threads reduces code execution time. Defining speed up as the ratio of the time taken to run the original Serial Program on one thread to the time taken to run the tiled Program on P threads, we achieve super-linear speed up (a value of speed up is greater than the number of threads used) for parallel tiled code against the original Serial code up to 32 threads and super-linear speed up scalability (increasing speed up with increasing the thread number) up to 8 threads. For one thread used, speed up is about 4.2 achieved on an Intel Xeon machine used for carrying out experiments.

  • tiling nussinov s rna folding loop nest with a space time approach
    BMC Bioinformatics, 2019
    Co-Authors: Marek Palkowski, Wlodzimierz Bielecki
    Abstract:

    An RNA primary structure, or sequence, is a single strand considered as a chain of nucleotides from the alphabet AUGC (adenine, uracil, guanine, cytosine). The strand can be folded onto itself, i.e., one segment of an RNA sequence might be paired with another segment of the same RNA sequence into a two-dimensional structure composed by a list of complementary base pairs, which are close together with the minimum energy. That list is called RNA’s secondary structure and is predicted by an RNA folding algorithm. RNA secondary structure prediction is a computing-intensive task that lies at the core of search applications in bioinformatics. We suggest a space-time tiling approach and apply it to generate parallel cache effective tiled code for RNA folding using Nussinov’s algorithm. Parallel tiled code generated with a suggested space-time loop tiling approach outperforms known related codes generated automatically by means of optimizing compilers and codes produced manually. The presented approach enables us to tile all the three loops of Nussinov’s recurrence that is not possible with commonly known tiling techniques. Generated parallel tiled code is scalable regarding to the number of parallel threads – increasing the number of threads reduces code execution time. Defining speed up as the ratio of the time taken to run the original Serial Program on one thread to the time taken to run the tiled Program on P threads, we achieve super-linear speed up (a value of speed up is greater than the number of threads used) for parallel tiled code against the original Serial code up to 32 threads and super-linear speed up scalability (increasing speed up with increasing the thread number) up to 8 threads. For one thread used, speed up is about 4.2 achieved on an Intel Xeon machine used for carrying out experiments.

Eric P Xing - One of the best experts on this subject based on the ideXlab platform.

  • automating dependence aware parallelization of machine learning training on distributed shared memory
    European Conference on Computer Systems, 2019
    Co-Authors: Jinliang Wei, Garth Gibso, Phillip Gibbons, Eric P Xing
    Abstract:

    Machine learning (ML) training is commonly parallelized using data parallelism. A fundamental limitation of data parallelism is that conflicting (concurrent) parameter accesses during ML training usually diminishes or even negates the benefits provided by additional parallel compute resources. Although it is possible to avoid conflicting parameter accesses by carefully scheduling the computation, existing systems rely on Programmer manual parallelization and it remains a question when such parallelization is possible. We present Orion, a system that automatically parallelizes Serial imperative ML Programs on distributed shared memory. The core of Orion is a static dependence analysis mechanism that determines when dependence-preserving parallelization is effective and maps a loop computation to an optimized distributed computation schedule. Our evaluation shows that for a number of ML applications, Orion can parallelize a Serial Program while preserving critical dependences and thus achieve a significantly faster convergence rate than data-parallel Programs and a matching convergence rate and comparable computation throughput to state-of-the-art manual parallelizations including model-parallel Programs.

Jinliang Wei - One of the best experts on this subject based on the ideXlab platform.

  • automating dependence aware parallelization of machine learning training on distributed shared memory
    European Conference on Computer Systems, 2019
    Co-Authors: Jinliang Wei, Garth Gibso, Phillip Gibbons, Eric P Xing
    Abstract:

    Machine learning (ML) training is commonly parallelized using data parallelism. A fundamental limitation of data parallelism is that conflicting (concurrent) parameter accesses during ML training usually diminishes or even negates the benefits provided by additional parallel compute resources. Although it is possible to avoid conflicting parameter accesses by carefully scheduling the computation, existing systems rely on Programmer manual parallelization and it remains a question when such parallelization is possible. We present Orion, a system that automatically parallelizes Serial imperative ML Programs on distributed shared memory. The core of Orion is a static dependence analysis mechanism that determines when dependence-preserving parallelization is effective and maps a loop computation to an optimized distributed computation schedule. Our evaluation shows that for a number of ML applications, Orion can parallelize a Serial Program while preserving critical dependences and thus achieve a significantly faster convergence rate than data-parallel Programs and a matching convergence rate and comparable computation throughput to state-of-the-art manual parallelizations including model-parallel Programs.

Phillip Gibbons - One of the best experts on this subject based on the ideXlab platform.

  • automating dependence aware parallelization of machine learning training on distributed shared memory
    European Conference on Computer Systems, 2019
    Co-Authors: Jinliang Wei, Garth Gibso, Phillip Gibbons, Eric P Xing
    Abstract:

    Machine learning (ML) training is commonly parallelized using data parallelism. A fundamental limitation of data parallelism is that conflicting (concurrent) parameter accesses during ML training usually diminishes or even negates the benefits provided by additional parallel compute resources. Although it is possible to avoid conflicting parameter accesses by carefully scheduling the computation, existing systems rely on Programmer manual parallelization and it remains a question when such parallelization is possible. We present Orion, a system that automatically parallelizes Serial imperative ML Programs on distributed shared memory. The core of Orion is a static dependence analysis mechanism that determines when dependence-preserving parallelization is effective and maps a loop computation to an optimized distributed computation schedule. Our evaluation shows that for a number of ML applications, Orion can parallelize a Serial Program while preserving critical dependences and thus achieve a significantly faster convergence rate than data-parallel Programs and a matching convergence rate and comparable computation throughput to state-of-the-art manual parallelizations including model-parallel Programs.