Biological Sequence

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 31527 Experts worldwide ranked by ideXlab platform

C Mears - One of the best experts on this subject based on the ideXlab platform.

  • a simple statistical algorithm for Biological Sequence compression
    Data Compression Conference, 2007
    Co-Authors: Lloyd Allison, C Mears
    Abstract:

    This paper introduces a novel algorithm for Biological Sequence compression that makes use of both statistical properties and repetition within Sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the Sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information Sequence provides insight for further study of the Biological Sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein Sequence datasets while maintaining a practical running time

  • DCC - A Simple Statistical Algorithm for Biological Sequence Compression
    2007 Data Compression Conference (DCC'07), 2007
    Co-Authors: Lloyd Allison, C Mears
    Abstract:

    This paper introduces a novel algorithm for Biological Sequence compression that makes use of both statistical properties and repetition within Sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the Sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information Sequence provides insight for further study of the Biological Sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein Sequence datasets while maintaining a practical running time

Dominique Lavenier - One of the best experts on this subject based on the ideXlab platform.

  • samba hardware accelerator for Biological Sequence comparison
    Bioinformatics, 1997
    Co-Authors: Pascale Guerdouxjamet, Dominique Lavenier
    Abstract:

    Motivation: SAMBA (Systolic Accelerator for Molecular Biological Applications) is a 128 processor hardware accelerator for speeding up the Sequence comparison process. The short-term objective is to provide a low-cost board to boost PC or workstation performance on this class of applications. This paper places SAMBA amongst other existing systems and highlights the original features. Results: Real performance obtained from the prototype is demonstrated. For example, a Sequence of 300 amino acids is scanned against SWISS-PROT-34 (21210389 residues) in 30 s using the Smith and Waterman algorithm. More time-consuming applications, like the bank-to-bank comparison, are computed in a few hours instead of days on standard workstations. Technology allows the prototype to fit onto a single PCI board for plugging into any PC or workstation. Availability: SAMBA can be tested on the WEB server at URL http://www.irisa.fr/SAMBA/

  • SAMBA: hardware accelerator for Biological Sequence comparison.
    Computer applications in the biosciences : CABIOS, 1997
    Co-Authors: P Guerdoux-jamet, Dominique Lavenier
    Abstract:

    SAMBA (Systolic Accelerator for Molecular Biological Applications) is a 128 processor hardware accelerator for speeding up the Sequence comparison process. The short-term objective is to provide a low-cost board to boost PC or workstation performance on this class of applications. This paper places SAMBA amongst other existing systems and highlights the original features.

  • Dedicated Hardware for Biological Sequence Comparison.
    Journal of Universal Computer Science, 1996
    Co-Authors: Dominique Lavenier
    Abstract:

    Biological Sequence comparison is a time consuming task on a Von Neuman computer. The addition of dedicated hardware for parallelizing the comparison algorithms results in a reduction of several orders of magnitude in the execution time. This paper presents and compares different dedicated approaches, based on the parallelization of the algorithms on linear arrays of processors.

Alba Cristina Magalhaes Alves De Melo - One of the best experts on this subject based on the ideXlab platform.

  • Using Multiple Fickett Bands to Accelerate Biological Sequence Comparisons.
    Journal of Computational Biology, 2019
    Co-Authors: G. G. Silva, Edans F. De O. Sandes, George Teodoro, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    Abstract Most of the exact algorithms for Biological Sequence comparison obtain the optimal result by calculating dynamic programming (DP) matrices with quadratic time and space complexity. Fickett...

  • Formalization of block pruning: reducing the number of cells computed in exact Biological Sequence comparison algorithms
    The Computer Journal, 2017
    Co-Authors: Edans F. De O. Sandes, George Teodoro, Xavier Martorell, Eduard Ayguadé, Maria Emilia M. T. Walter, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    This is a pre-copyedited, author-produced version of an article accepted for publication in The Computer Journal following peer review. The version of record Edans F O Sandes, George L M Teodoro, Maria Emilia M T Walter, Xavier Martorell, Eduard Ayguade, Alba C M A Melo; Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms, The Computer Journal, Volume 61, Issue 5, 1 May 2018, Pages 687–713 is available online at: The Computer Journal https://academic.oup.com/comjnl/article-abstract/61/5/687/4539903 and https://doi.org/10.1093/comjnl/bxx090.

  • parallel optimal pairwise Biological Sequence comparison algorithms platforms and classification
    ACM Computing Surveys, 2016
    Co-Authors: Edans F. De O. Sandes, Azzedine Boukerche, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    Many bioinformatics applications, such as the optimal pairwise Biological Sequence comparison, demand a great quantity of computing resource, thus are excellent candidates to run in high-performance computing (HPC) platforms. In the last two decades, a large number of HPC-based solutions were proposed for this problem that run in different platforms, targeting different types of comparisons with slightly different algorithms and making the comparative analysis of these approaches very difficult. This article proposes a classification of parallel optimal pairwise Sequence comparison solutions, in order to highlight their main characteristics in a unified way. We then discuss several HPC-based solutions, including clusters of multicores and accelerators such as Cell Broadband Engines (CellBEs), Field-Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and Intel Xeon Phi, as well as hybrid solutions, which combine two or more platforms, providing the actual landscape of the main proposals in this area. Finally, we present open questions and perspectives in this research field.

  • a hardware accelerator for the fast retrieval of dialign Biological Sequence alignments in linear space
    IEEE Transactions on Computers, 2010
    Co-Authors: Azzedine Boukerche, Alba Cristina Magalhaes Alves De Melo, Jan M Correa, Ricardo P Jacobi
    Abstract:

    The recent and astonishing accomplishments in the field of Genomics would not have been possible without the techniques, algorithms, and tools developed in Bioinformatics. Biological Sequence comparison is an important operation in Bioinformatics because it is used to determine how similar two Sequences are. As a result of this operation, one or more alignments are produced. DIALIGN is an exact algorithm that uses dynamic programming to obtain optimal Biological Sequence alignments in quadratic space and time. One effective way to accelerate DIALIGN is to design FPGA-based architectures to execute it. Nevertheless, the complete retrieval of an alignment in hardware requires modifications on the original algorithm because it executes in quadratic space. In this paper, we propose and evaluate two FPGA-based accelerators executing DIALIGN in linear space: one to obtain the optimal DIALIGN score (DIALIGN-Score) and one to retrieve the DIALIGN alignment (DIALIGN-Alignment). Because it appears to be no documented variant of the DIALIGN algorithm that produces alignments in linear space, we here propose a linear space variant of the DIALIGN algorithm and have designed the DIALIGN-Alignment accelerator to implement it. The experimental results show that impressive speedups can be obtained with both accelerators when comparing long Biological Sequences: the DIALIGN-Score accelerator achieved a speedup of 383.4 and the DIALIGN-Alignment accelerator reached a speedup of 141.38.

Edans F. De O. Sandes - One of the best experts on this subject based on the ideXlab platform.

  • Using Multiple Fickett Bands to Accelerate Biological Sequence Comparisons.
    Journal of Computational Biology, 2019
    Co-Authors: G. G. Silva, Edans F. De O. Sandes, George Teodoro, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    Abstract Most of the exact algorithms for Biological Sequence comparison obtain the optimal result by calculating dynamic programming (DP) matrices with quadratic time and space complexity. Fickett...

  • Formalization of block pruning: reducing the number of cells computed in exact Biological Sequence comparison algorithms
    The Computer Journal, 2017
    Co-Authors: Edans F. De O. Sandes, George Teodoro, Xavier Martorell, Eduard Ayguadé, Maria Emilia M. T. Walter, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    This is a pre-copyedited, author-produced version of an article accepted for publication in The Computer Journal following peer review. The version of record Edans F O Sandes, George L M Teodoro, Maria Emilia M T Walter, Xavier Martorell, Eduard Ayguade, Alba C M A Melo; Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms, The Computer Journal, Volume 61, Issue 5, 1 May 2018, Pages 687–713 is available online at: The Computer Journal https://academic.oup.com/comjnl/article-abstract/61/5/687/4539903 and https://doi.org/10.1093/comjnl/bxx090.

  • parallel optimal pairwise Biological Sequence comparison algorithms platforms and classification
    ACM Computing Surveys, 2016
    Co-Authors: Edans F. De O. Sandes, Azzedine Boukerche, Alba Cristina Magalhaes Alves De Melo
    Abstract:

    Many bioinformatics applications, such as the optimal pairwise Biological Sequence comparison, demand a great quantity of computing resource, thus are excellent candidates to run in high-performance computing (HPC) platforms. In the last two decades, a large number of HPC-based solutions were proposed for this problem that run in different platforms, targeting different types of comparisons with slightly different algorithms and making the comparative analysis of these approaches very difficult. This article proposes a classification of parallel optimal pairwise Sequence comparison solutions, in order to highlight their main characteristics in a unified way. We then discuss several HPC-based solutions, including clusters of multicores and accelerators such as Cell Broadband Engines (CellBEs), Field-Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) and Intel Xeon Phi, as well as hybrid solutions, which combine two or more platforms, providing the actual landscape of the main proposals in this area. Finally, we present open questions and perspectives in this research field.

  • CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters
    2014 14th IEEE ACM International Symposium on Cluster Cloud and Grid Computing, 2014
    Co-Authors: Edans F. De O. Sandes, A.c.m.a. De Melo, Guillermo Miranda, Xavier Martorell, Eduard Ayguadé
    Abstract:

    This paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) Biological Sequence comparison algorithm for huge DNA Sequences in multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple GPUs, which communicate border elements to the neighbour, using a circular buffer mechanism. We also provide a method to predict the execution time and speedup of a comparison, given the number of the GPUs and the sizes of the Sequences. The results obtained with a large multi-GPU environment show that our solution is scalable when varying the sizes of the Sequences and/or the number of GPUs and that our prediction method is accurate. With our proposal, we were able to compare the largest human chromosome with its homologous chimpanzee chromosome (249 Millions of Base Pairs (MBP) x 228 MBP) using 64 GPUs, achieving 1.7 TCUPS (Tera Cells Updated per Second). As far as we know, this is the largest comparison ever done using the Smith-Waterman algorithm.

Lloyd Allison - One of the best experts on this subject based on the ideXlab platform.

  • a simple statistical algorithm for Biological Sequence compression
    Data Compression Conference, 2007
    Co-Authors: Lloyd Allison, C Mears
    Abstract:

    This paper introduces a novel algorithm for Biological Sequence compression that makes use of both statistical properties and repetition within Sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the Sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information Sequence provides insight for further study of the Biological Sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein Sequence datasets while maintaining a practical running time

  • DCC - A Simple Statistical Algorithm for Biological Sequence Compression
    2007 Data Compression Conference (DCC'07), 2007
    Co-Authors: Lloyd Allison, C Mears
    Abstract:

    This paper introduces a novel algorithm for Biological Sequence compression that makes use of both statistical properties and repetition within Sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the Sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information Sequence provides insight for further study of the Biological Sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein Sequence datasets while maintaining a practical running time

  • Sequence complexity for Biological Sequence analysis
    Computational Biology and Chemistry, 2000
    Co-Authors: Lloyd Allison, Linda Stern, Timothy C Edgoose
    Abstract:

    A new statistical model for DNA considers a Sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subSequences, i.e. instances of repeats do not need to match each other exactly. Both forward- and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given Sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the information content of a Sequence, either in total or base by base. This amounts to looking at Sequences from a data-compression point of view and it is argued that this is a good way to tackle intelligent Sequence analysis in general. © 2000 Elsevier Science Ltd. All rights reserved.