Alignment Score - Explore the Science & Experts | ideXlab

Scan Science and Technology

Contact Leading Edge Experts & Companies

Alignment Score

The Experts below are selected from a list of 6285 Experts worldwide ranked by ideXlab platform

Mohammed Javeed Zaki – 1st expert on this subject based on the ideXlab platform

  • Indexing protein structures using suffix trees.
    Methods in molecular biology (Clifton N.J.), 2008
    Co-Authors: Feng Gao, Mohammed Javeed Zaki

    Abstract:

    Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Calpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain Alignments with database proteins. Similar proteins are selected by their Alignment Score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.

  • PSIST: indexing protein structures using suffix trees
    2005 IEEE Computational Systems Bioinformatics Conference (CSB'05), 2005
    Co-Authors: Mohammed Javeed Zaki

    Abstract:

    Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between C/sub /spl alpha// atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain Alignments with database proteins. Similar proteins are selected by their Alignment Score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.

Xiaoqiu Huang – 2nd expert on this subject based on the ideXlab platform

  • Sequence-specific sequence comparison using pairwise statistical significance
    Advances in Experimental Medicine and Biology, 2011
    Co-Authors: Xiaoqiu Huang, Ankit Agrawal

    Abstract:

    Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence Alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pair-wise sequence Alignment methods align two sequences using a substitution matrix consisting of pairwise Scores of aligning different residues with each other (like BLOSUM62), and give an Alignment Score for the given sequence-pair. The biologists routinely use such pairwise Alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the Alignment Score rather than by the Alignment Score alone. This research addresses the problem of accurately estimating statistical significance of pairwise Alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence-specific.
    The major contributions of this research work are as follows. Firstly, using sequence-specific strategies for pairwise sequence Alignment in conjunction with sequence-specific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequence-specific, and position-specific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSI-BLAST. Thirdly, design and implementation of heuristics to speed-up pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online.
    With the all-pervasive application of sequence Alignment methods in bioinformatics using the ever-increasing sequence data, this work is expected to offer useful contributions to the research community.

  • Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices
    IEEE ACM Transactions on Computational Biology and Bioinformatics, 2011
    Co-Authors: Ankit Agrawal, Xiaoqiu Huang

    Abstract:

    Pairwise sequence Alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high Alignment Score, but relatedness is usually judged by statistical significance rather than by Alignment Score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise Alignment Scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.

  • Pairwise statistical significance of local sequence Alignment using multiple parameter sets and empirical justification of parameter set change penalty
    BMC Bioinformatics, 2009
    Co-Authors: Ankit Agrawal, Xiaoqiu Huang

    Abstract:

    Background Accurate estimation of statistical significance of a pairwise Alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the Alignment Score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for Alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.

Ankit Agrawal – 3rd expert on this subject based on the ideXlab platform

  • Sequence-specific sequence comparison using pairwise statistical significance
    Advances in Experimental Medicine and Biology, 2011
    Co-Authors: Xiaoqiu Huang, Ankit Agrawal

    Abstract:

    Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence Alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pair-wise sequence Alignment methods align two sequences using a substitution matrix consisting of pairwise Scores of aligning different residues with each other (like BLOSUM62), and give an Alignment Score for the given sequence-pair. The biologists routinely use such pairwise Alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the Alignment Score rather than by the Alignment Score alone. This research addresses the problem of accurately estimating statistical significance of pairwise Alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence-specific.
    The major contributions of this research work are as follows. Firstly, using sequence-specific strategies for pairwise sequence Alignment in conjunction with sequence-specific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequence-specific, and position-specific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSI-BLAST. Thirdly, design and implementation of heuristics to speed-up pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online.
    With the all-pervasive application of sequence Alignment methods in bioinformatics using the ever-increasing sequence data, this work is expected to offer useful contributions to the research community.

  • Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices
    IEEE ACM Transactions on Computational Biology and Bioinformatics, 2011
    Co-Authors: Ankit Agrawal, Xiaoqiu Huang

    Abstract:

    Pairwise sequence Alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high Alignment Score, but relatedness is usually judged by statistical significance rather than by Alignment Score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise Alignment Scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.

  • Pairwise statistical significance of local sequence Alignment using multiple parameter sets and empirical justification of parameter set change penalty
    BMC Bioinformatics, 2009
    Co-Authors: Ankit Agrawal, Xiaoqiu Huang

    Abstract:

    Background Accurate estimation of statistical significance of a pairwise Alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the Alignment Score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for Alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.