Substitution Matrix

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5964 Experts worldwide ranked by ideXlab platform

Pierre M Durand - One of the best experts on this subject based on the ideXlab platform.

  • robust sequence alignment using evolutionary rates coupled with an amino acid Substitution Matrix
    BMC Bioinformatics, 2015
    Co-Authors: Andrew Ndhlovu, Scott Hazelhurst, Pierre M Durand
    Abstract:

    Background Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a Substitution Matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid Substitution Matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB.

Kay Hamacher - One of the best experts on this subject based on the ideXlab platform.

  • Substitution Matrix based color schemes for sequence alignment visualization
    BMC Bioinformatics, 2020
    Co-Authors: Patrick Kunzmann, Benjamin E Mayer, Kay Hamacher
    Abstract:

    Visualization of multiple sequence alignments often includes colored symbols, usually characters encoding amino acids, according to some (physical) properties, such as hydrophobicity or charge. Typically, color schemes are created manually, so that equal or similar colors are assigned to amino acids that share similar properties. However, this assessment is subjective and may not represent the similarity of symbols very well. In this article we propose a different approach for color scheme creation: We leverage the similarity information of a Substitution Matrix to derive an appropriate color scheme. Similar colors are assigned to high scoring pairs of symbols, distant colors are assigned to low scoring pairs. In order to find these optimal points in color space a simulated annealing algorithm is employed. Using the Substitution Matrix as basis for a color scheme is consistent with the alignment, which itself is based on the very Substitution Matrix. This approach allows fully automatic generation of new color schemes, even for special purposes which have not been covered, yet, including schemes for structural alphabets or schemes that are adapted for people with color vision deficiency.

  • pfasum a Substitution Matrix from pfam structural alignments
    BMC Bioinformatics, 2017
    Co-Authors: Frank Keul, Martin Hess, Michael Goesele, Kay Hamacher
    Abstract:

    Detecting homologous protein sequences and computing multiple sequence alignments (MSA) are fundamental tasks in molecular bioinformatics. These tasks usually require a Substitution Matrix for modeling evolutionary Substitution events derived from a set of aligned sequences. Over the last years, the known sequence space increased drastically and several publications demonstrated that this can lead to significantly better performing matrices. Interestingly, matrices based on dated sequence datasets are still the de facto standard for both tasks even though their data basis may limit their capabilities. We address these aspects by presenting a new Substitution Matrix series called PFASUM. These matrices are derived from Pfam seed MSAs using a novel algorithm and thus build upon expert ground truth data covering a large and diverse sequence space. We show results for two use cases: First, we tested the homology search performance of PFASUM matrices on up-to-date ASTRAL databases with varying sequence similarity. Our study shows that the usage of PFASUM matrices can lead to significantly better homology search results when compared to conventional matrices. PFASUM matrices with comparable relative entropies to the commonly used Substitution matrices BLOSUM50, BLOSUM62, PAM250, VTML160 and VTML200 outperformed their corresponding counterparts in 93% of all test cases. A general assessment also comparing matrices with different relative entropies showed that PFASUM matrices delivered the best homology search performance in the test set. Second, our results demonstrate that the usage of PFASUM matrices for MSA construction improves their quality when compared to conventional matrices. On up-to-date MSA benchmarks, at least 60% of all MSAs were reconstructed in an equal or higher quality when using MUSCLE with PFASUM31, PFASUM43 and PFASUM60 matrices instead of conventional matrices. This rate even increases to at least 76% for MSAs containing similar sequences. We present the novel PFASUM Substitution matrices derived from manually curated MSA ground truth data covering the currently known sequence space. Our results imply that PFASUM matrices improve homology search performance as well as MSA quality in many cases when compared to conventional Substitution matrices. Hence, we encourage the usage of PFASUM matrices and especially PFASUM60 for these specific tasks.

Jinnmoon Yang - One of the best experts on this subject based on the ideXlab platform.

David Eisenberg - One of the best experts on this subject based on the ideXlab platform.

  • a 3d 1d Substitution Matrix for protein fold recognition that includes predicted secondary structure of the sequence
    Journal of Molecular Biology, 1997
    Co-Authors: Danny W Rice, David Eisenberg
    Abstract:

    Abstract In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue Substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we have developed a (7 × 3 × 2 × 7 × 3) 3D-1D Substitution Matrix (called H3P2), calculated from a database of 119 structural pairs. Members of each pair share a similar fold, but have sequence identity less than 30%. Each probe sequence position is defined by one of seven residue classes and three secondary structure classes. Each homologous fold position is defined by one of seven residue classes, three secondary structure classes, and two burial classes. Thus the Matrix is five-dimensional and contains 7 × 3 × 2 × 7 × 3=882 elements or 3D-1D scores. The first step in assigning a probe sequence to its homologous fold is the prediction of the three-state (helix, strand, coil) secondary structure of the probe; here we use the profile based neural network prediction of secondary structure (PHD) program. Then a dynamic programming algorithm uses the H3P2 Matrix to align the probe sequence with structures in a representative fold library. To test the effectiveness of the H3P2 Matrix a challenging, fold class diverse, and cross-validated benchmark assessment is used to compare the H3P2 Matrix to the GONNET, PAM250, BLOSUM62 and a secondary structure only Substitution Matrix. For distantly related sequences the H3P2 Matrix detects more homologous structures at higher reliabilities than do these other Substitution matrices, based on sensitivity versus specificity plots (or SENS-SPEC plots). The added efficacy of the H3P2 Matrix arises from its information on the statistical preferences for various sequence-structure environment combinations from very distantly related proteins. It introduces the predicted secondary structure information from a sequence into fold recognition in a statistical way that normalizes the inherent correlations between residue type, secondary structure and solvent accessibility.

Wei-mou Zheng - One of the best experts on this subject based on the ideXlab platform.

  • an amino acid Substitution Matrix for protein conformation identification
    Journal of Bioinformatics and Computational Biology, 2006
    Co-Authors: Wei-mou Zheng
    Abstract:

    Amino acid Substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid Substitution Matrix CBSM60. The Matrix shows an improved performance in conformational segment search and homolog detection.

  • relation between weight Matrix and Substitution Matrix motif search by similarity
    Bioinformatics, 2005
    Co-Authors: Wei-mou Zheng
    Abstract:

    Motivation: The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the result produced by the Gibbs sampler in a single run. The deterministic EM methods tend to get trapped by local optima. Solutions found by greedy approaches are rarely sufficiently good. Results: A simple model describing a motif or a portion of local multiple sequence alignment is the weight Matrix model, in which a motif is characterized with position-specific probabilities. Two Substitution matrices are proposed to relate the sequence similarity with the weight Matrix. Combining the Substitution Matrix and weight Matrix, we examine three typical sets of protein sequences with increasing complexity. At a low score threshold for pair similarity, sliding windows are compared with a seed window to find the score sum, which provides a measure of statistical significance for multiple sequence comparison. Such a similarity analysis reveals many aspects of motifs. Blocks determined by similarity can be used to deduce a primary weight Matrix or an improved Substitution Matrix. The algorithm successfully obtains the optimal solution for the test sets by just greedy iteration. Availability: Softwares and sequence datasets are available on request from the author. Contact: [email protected]

  • a protein structural alphabet and its Substitution Matrix clesum
    arXiv: Biomolecules, 2004
    Co-Authors: Wei-mou Zheng
    Abstract:

    By using a mixture model for the density distribution of the three pseudobond angles formed by $C_\alpha$ atoms of four consecutive residues, the local structural states are discretized as 17 conformational letters of a protein structural alphabet. This coarse-graining procedure converts a 3D structure to a 1D code sequence. A Substitution Matrix between these letters is constructed based on the structural alignments of the FSSP database.

  • Simplified amino acid alphabets based on deviation of conditional probability from random background
    Physical Review E - Statistical Physics Plasmas Fluids and Related Interdisciplinary Topics, 2002
    Co-Authors: Xin Liu, Di Liu, Ji Qi, Wei-mou Zheng
    Abstract:

    The primitive data for deducing the Miyazawa-Jernigan contact energy or blocks Substitution Matrix (BLOSUM) consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such a conditional probability from random background, a scheme for the reduction of the amino acid alphabet is proposed. It is observed that an evident discrepancy exists between the reduced alphabets obtained from the raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking a homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained Substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.