Scan Science and Technology
Contact Leading Edge Experts & Companies
Alignment Score
The Experts below are selected from a list of 6285 Experts worldwide ranked by ideXlab platform
Mohammed Javeed Zaki – 1st expert on this subject based on the ideXlab platform

Indexing protein structures using suffix trees.
Methods in molecular biology (Clifton N.J.), 2008CoAuthors: Feng Gao, Mohammed Javeed ZakiAbstract:Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Calpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain Alignments with database proteins. Similar proteins are selected by their Alignment Score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.

PSIST: indexing protein structures using suffix trees
2005 IEEE Computational Systems Bioinformatics Conference (CSB'05), 2005CoAuthors: Mohammed Javeed ZakiAbstract:Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between C/sub /spl alpha// atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain Alignments with database proteins. Similar proteins are selected by their Alignment Score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.
Xiaoqiu Huang – 2nd expert on this subject based on the ideXlab platform

Sequencespecific sequence comparison using pairwise statistical significance
Advances in Experimental Medicine and Biology, 2011CoAuthors: Xiaoqiu Huang, Ankit AgrawalAbstract:Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence Alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pairwise sequence Alignment methods align two sequences using a substitution matrix consisting of pairwise Scores of aligning different residues with each other (like BLOSUM62), and give an Alignment Score for the given sequencepair. The biologists routinely use such pairwise Alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the Alignment Score rather than by the Alignment Score alone. This research addresses the problem of accurately estimating statistical significance of pairwise Alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequencespecific.
The major contributions of this research work are as follows. Firstly, using sequencespecific strategies for pairwise sequence Alignment in conjunction with sequencespecific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequencespecific, and positionspecific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSIBLAST. Thirdly, design and implementation of heuristics to speedup pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online.
With the allpervasive application of sequence Alignment methods in bioinformatics using the everincreasing sequence data, this work is expected to offer useful contributions to the research community. 
Pairwise Statistical Significance of Local Sequence Alignment Using SequenceSpecific and PositionSpecific Substitution Matrices
IEEE ACM Transactions on Computational Biology and Bioinformatics, 2011CoAuthors: Ankit Agrawal, Xiaoqiu HuangAbstract:Pairwise sequence Alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high Alignment Score, but relatedness is usually judged by statistical significance rather than by Alignment Score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise Alignment Scores. The improvement was mainly attributed to making the statistical significance estimation process more sequencespecific and databaseindependent. In this paper, we use sequencespecific and positionspecific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequencespecific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequencespecific substitution matrices at different levels of sequencespecific contribution were conducted, and results confirm that using sequencespecific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSIBLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSIBLAST results are significantly better. Further, using positionspecific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSIBLAST using pretrained PSSMs.

Pairwise statistical significance of local sequence Alignment using multiple parameter sets and empirical justification of parameter set change penalty
BMC Bioinformatics, 2009CoAuthors: Ankit Agrawal, Xiaoqiu HuangAbstract:Background Accurate estimation of statistical significance of a pairwise Alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSIBLAST, and comparable and at times significantly better than SSEARCH. Using nonzero parameter set change penalty values give better performance than zero penalty. Conclusion The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the Alignment Score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for Alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a timeconsuming database search.
Ankit Agrawal – 3rd expert on this subject based on the ideXlab platform

Sequencespecific sequence comparison using pairwise statistical significance
Advances in Experimental Medicine and Biology, 2011CoAuthors: Xiaoqiu Huang, Ankit AgrawalAbstract:Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence Alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pairwise sequence Alignment methods align two sequences using a substitution matrix consisting of pairwise Scores of aligning different residues with each other (like BLOSUM62), and give an Alignment Score for the given sequencepair. The biologists routinely use such pairwise Alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the Alignment Score rather than by the Alignment Score alone. This research addresses the problem of accurately estimating statistical significance of pairwise Alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequencespecific.
The major contributions of this research work are as follows. Firstly, using sequencespecific strategies for pairwise sequence Alignment in conjunction with sequencespecific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequencespecific, and positionspecific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSIBLAST. Thirdly, design and implementation of heuristics to speedup pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online.
With the allpervasive application of sequence Alignment methods in bioinformatics using the everincreasing sequence data, this work is expected to offer useful contributions to the research community. 
Pairwise Statistical Significance of Local Sequence Alignment Using SequenceSpecific and PositionSpecific Substitution Matrices
IEEE ACM Transactions on Computational Biology and Bioinformatics, 2011CoAuthors: Ankit Agrawal, Xiaoqiu HuangAbstract:Pairwise sequence Alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high Alignment Score, but relatedness is usually judged by statistical significance rather than by Alignment Score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise Alignment Scores. The improvement was mainly attributed to making the statistical significance estimation process more sequencespecific and databaseindependent. In this paper, we use sequencespecific and positionspecific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequencespecific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequencespecific substitution matrices at different levels of sequencespecific contribution were conducted, and results confirm that using sequencespecific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSIBLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSIBLAST results are significantly better. Further, using positionspecific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSIBLAST using pretrained PSSMs.

Pairwise statistical significance of local sequence Alignment using multiple parameter sets and empirical justification of parameter set change penalty
BMC Bioinformatics, 2009CoAuthors: Ankit Agrawal, Xiaoqiu HuangAbstract:Background Accurate estimation of statistical significance of a pairwise Alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSIBLAST, and comparable and at times significantly better than SSEARCH. Using nonzero parameter set change penalty values give better performance than zero penalty. Conclusion The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the Alignment Score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for Alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a timeconsuming database search.