Gap Penalty

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5163 Experts worldwide ranked by ideXlab platform

Andrej Sali - One of the best experts on this subject based on the ideXlab platform.

  • variable Gap Penalty for protein sequence structure alignment
    Protein Engineering Design & Selection, 2006
    Co-Authors: M S Madhusudhan, Marc A Martirenom, Roberto Sanchez, Andrej Sali
    Abstract:

    The Penalty for inserting Gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable Gap Penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes Gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the Penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine Gap Penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional approximately 7 million accurately modeled residues in the approximately 1.1 million proteins that are detectably related to a known structure.

  • a variable Gap Penalty function and feature weights for protein 3 d structure comparisons
    Protein Engineering, 1992
    Co-Authors: Zhanyang Zhu, Andrej Sali, Tom L Blundell
    Abstract:

    We have developed a variable Gap Penalty function for use in the comparison program COMPARER which aligns protein sequences on the basis of their 3-D structures. For deletions and insertions, components are a function of structural features of individual amino acid residues (e.g. secondary structure and accessibility). We have also obtained relative weights for different features used in the comparison by examining the equivalent residues in weight matrices and in alignments for pairs of 3-D structures where the equivalencies are relatively unambiguous. We have used the new parameters and the variable Gap Penalty function in COMPARER to align protein structures in the Brookhaven Data Bank. The variable Gap Penalty function is useful especially in avoiding Gaps in secondary structure elements and the new feature weights give improved alignments. The alignments for both azurins and plastocyanins and N- and C-terminal lobes for aspartic proteinases are discussed.

Todd J Treangen - One of the best experts on this subject based on the ideXlab platform.

  • vulcan improved long read mapping and structural variant calling via dual mode alignment
    GigaScience, 2021
    Co-Authors: Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J Sedlazeck, Todd J Treangen
    Abstract:

    Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single Gap Penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. Findings We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct Gap Penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. Conclusions Vulcan is the first long-read mapping framework that combines two distinct Gap Penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.

  • vulcan improved long read mapping and structural variant calling via dual mode alignment
    bioRxiv, 2021
    Co-Authors: Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J Sedlazeck, Todd J Treangen
    Abstract:

    Abstract Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single Gap Penalty across distinct mutational hotspots reduces read alignment accuracy and impedes structural variant detection. Findings We tested our hypothesis by implementing a read mapping pipeline called Vulcan that uses two distinct Gap Penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via e.g. minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long read mapper (NGMLR). In support of our hypothesis, we show Vulcan improves the alignments for Oxford Nanopore Technology (ONT) long-reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read mapping methods alone. Conclusions Vulcan is the first long-read mapping framework that combines two distinct Gap Penalty modes, resulting in improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan

Medhat Mahmoud - One of the best experts on this subject based on the ideXlab platform.

  • vulcan improved long read mapping and structural variant calling via dual mode alignment
    GigaScience, 2021
    Co-Authors: Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J Sedlazeck, Todd J Treangen
    Abstract:

    Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single Gap Penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. Findings We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct Gap Penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. Conclusions Vulcan is the first long-read mapping framework that combines two distinct Gap Penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.

  • vulcan improved long read mapping and structural variant calling via dual mode alignment
    bioRxiv, 2021
    Co-Authors: Medhat Mahmoud, Viginesh Vaibhav Muraliraman, Fritz J Sedlazeck, Todd J Treangen
    Abstract:

    Abstract Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single Gap Penalty across distinct mutational hotspots reduces read alignment accuracy and impedes structural variant detection. Findings We tested our hypothesis by implementing a read mapping pipeline called Vulcan that uses two distinct Gap Penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via e.g. minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long read mapper (NGMLR). In support of our hypothesis, we show Vulcan improves the alignments for Oxford Nanopore Technology (ONT) long-reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read mapping methods alone. Conclusions Vulcan is the first long-read mapping framework that combines two distinct Gap Penalty modes, resulting in improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan

Kishan G. Mehrotra - One of the best experts on this subject based on the ideXlab platform.

  • parallel biological sequence comparison using prefix computations
    Journal of Parallel and Distributed Computing, 2003
    Co-Authors: Srinivas Aluru, Natsuhiko Futamura, Kishan G. Mehrotra
    Abstract:

    We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine Gap Penalty functions, full-sequence and subsequence matching, and space-saving algorithms. Commonly used sequential algorithms solve the sequence comparison problems in O(mn) time and O(m + n) space, where m and n are the lengths of the sequences being compared. All the algorithms presented in this paper are time optimal with respect to the sequential algorithms and can use O(n/log n) processors where n is the length of the larger sequence. While optimal parallel algorithms for many of these problems are known, we use a simple framework and demonstrate how these problems can be solved systematically using repeated parallel prefix operations. We also present a space-saving algorithm that uses O(m + n/p) space and runs in optimal time where p is the number of the processors used. We implemented the parallel space-saving algorithm and provide experimental results on an IBM SP-2 and a Pentium cluster.

  • parallel biological sequence comparison using prefix computations
    International Parallel Processing Symposium, 1999
    Co-Authors: Srinivas Aluru, Natsuhiko Futamura, Kishan G. Mehrotra
    Abstract:

    We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine Gap Penalty functions, full-sequence and subsequence matching and space-saving algorithms. The best known sequential algorithms solve these problems in O(mn) time and O(m+n) space, where m and n are the lengths of the two sequences. All the algorithms presented in this paper are time optimal with respect to the best known sequential algorithms and can use O(n/log n) processors where n is the length of the larger sequence. While optimal parallel algorithms for many of these problems are known, we use a simple framework and demonstrate how these problems can be solved systematically using repeated parallel prefix operations. We also present a space-saving algorithm that uses O(m+n/p) space and runs in optimal time where p is the number of the processors used.

Tom L Blundell - One of the best experts on this subject based on the ideXlab platform.

  • fugue sequence structure homology recognition using environment specific substitution tables and structure dependent Gap penalties
    Journal of Molecular Biology, 2001
    Co-Authors: Jiye Shi, Tom L Blundell, Kenji Mizuguchi
    Abstract:

    Abstract FUGUE, a program for recognizing distant homologues by sequence-structure comparison ( http://www-cryst.bioc.cam.ac.uk/fugue/ ), has three key features. (1) Improved environment-specific substitution tables. Substitutions of an amino acid in a protein structure are constrained by its local structural environment, which can be defined in terms of secondary structure, solvent accessibility, and hydrogen bonding status. The environment-specific substitution tables have been derived from structural alignments in the HOMSTRAD database ( http://www-cryst.bioc.cam.ac.uk/homstrad/ ). (2) Automatic selection of alignment algorithm with detailed structure-dependent Gap penalties. FUGUE uses the global-local algorithm to align a sequence-structure pair when they greatly differ in length and uses the global algorithm in other cases. The Gap Penalty at each position of the structure is determined according to its solvent accessibility, its position relative to the secondary structure elements (SSEs) and the conservation of the SSEs. (3) Combined information from both multiple sequences and multiple structures. FUGUE is designed to align multiple sequences against multiple structures to enrich the conservation/variation information. We demonstrate that the combination of these three key features implemented in FUGUE improves both homology recognition performance and alignment accuracy.

  • a variable Gap Penalty function and feature weights for protein 3 d structure comparisons
    Protein Engineering, 1992
    Co-Authors: Zhanyang Zhu, Andrej Sali, Tom L Blundell
    Abstract:

    We have developed a variable Gap Penalty function for use in the comparison program COMPARER which aligns protein sequences on the basis of their 3-D structures. For deletions and insertions, components are a function of structural features of individual amino acid residues (e.g. secondary structure and accessibility). We have also obtained relative weights for different features used in the comparison by examining the equivalent residues in weight matrices and in alignments for pairs of 3-D structures where the equivalencies are relatively unambiguous. We have used the new parameters and the variable Gap Penalty function in COMPARER to align protein structures in the Brookhaven Data Bank. The variable Gap Penalty function is useful especially in avoiding Gaps in secondary structure elements and the new feature weights give improved alignments. The alignments for both azurins and plastocyanins and N- and C-terminal lobes for aspartic proteinases are discussed.