Protein Structure Prediction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Yang Zhang - One of the best experts on this subject based on the ideXlab platform.

  • mmpred a distance assisted multimodal conformation sampling for de novo Protein Structure Prediction
    Bioinformatics, 2021
    Co-Authors: Kailong Zhao, Yang Zhang, Xiaogen Zhou, Jun Liu, Guijun Zhang
    Abstract:

    Motivation The mathematically optimal solution in computational Protein folding simulations does not always correspond to the native Structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo Protein Structure folding simulations. Results A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo Protein Structure Prediction. The protocol consists of three stages. In the first modal exploration stage, a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse Structures in different low-energy basins. In the second modal maintaining stage, an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on 320 non-redundant Proteins, where MMpred obtains models with TM-score ≥ 0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same distance constraints. In addition, on 320 benchmark Proteins, the average TM-score of the enhanced version of MMpred (E-MMpred) is 0.732 on the best model, which is comparable to trRosetta (0.730). Availability The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. Supplementary information Supplementary data are available at Bioinformatics online.

  • mmpred a distance assisted multimodal conformation sampling for de novo Protein Structure Prediction
    bioRxiv, 2021
    Co-Authors: Kailong Zhao, Yang Zhang, Xiaogen Zhou, Jun Liu, Guijun Zhang
    Abstract:

    MotivationThe mathematically optimal solution in computational Protein folding simulations does not always correspond to the native Structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo Protein Structure folding simulations. ResultsA distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo Protein Structure Prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse Structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant Proteins, where MMpred obtains models with TM-score[≥]0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same set of distance constraints. The results showed that MMpred can help significantly improve the model accuracy of Protein assembly simulations through the sampling of multiple promising energy basins with enhanced structural diversity. AvailabilityThe source code and executable versions are freely available at https://github.com/iobio-zjut/MMpred. Contactzgj@zjut.edu.cn or zhng@umich.edu or sujz@wmu.edu.cn

  • cglfold a contact assisted de novo Protein Structure Prediction using global exploration and loop perturbation sampling algorithm
    Bioinformatics, 2020
    Co-Authors: Xiaogen Zhou, Yang Zhang, Guijun Zhang
    Abstract:

    MOTIVATION: Regions that connect secondary Structure elements in a Protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of Protein Structure Prediction can be improved using a loop-specific sampling strategy. RESULTS: A novel de novo Protein Structure Prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark Proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the Structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score >/= 0.5 models on 95 standard test Proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • deep learning contact map guided Protein Structure Prediction in casp13
    Proteins, 2019
    Co-Authors: Wei Zheng, Chengxin Zhang, Robin Pearce, S M Mortuza, Yang Zhang
    Abstract:

    We report the results of two fully automated Structure Prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact Prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide Structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map Predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary Structure of multi-domain Proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact Prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.

  • a comparative assessment and analysis of 20 representative sequence alignment methods for Protein Structure Prediction
    Scientific Reports, 2013
    Co-Authors: Renxiang Yan, Jianyi Yang, Sara E Walker, Yang Zhang
    Abstract:

    Protein sequence alignment is essential for template-based Protein Structure Prediction and function annotation. We collect 20 sequence alignment algorithms, 10 published and 10 newly developed, which cover all representative sequence- and profile-based alignment approaches. These algorithms are benchmarked on 538 non-redundant Proteins for Protein fold-recognition on a uniform template library. Results demonstrate dominant advantage of profile-profile based methods, which generate models with average TM-score 26.5% higher than sequence-profile methods and 49.8% higher than sequence-sequence alignment methods. There is no obvious difference in results between methods with profiles generated from PSI-BLAST PSSM matrix and hidden Markov models. Accuracy of profile-profile alignments can be further improved by 9.6% or 21.4% when predicted or native Structure features are incorporated. Nevertheless, TM-scores from profile-profile methods including experimental structural features are still 37.1% lower than that from TM-align, demonstrating that the fold-recognition problem cannot be solved solely by improving accuracy of Structure feature Predictions.

David Baker - One of the best experts on this subject based on the ideXlab platform.

  • improved Protein Structure Prediction using predicted interresidue orientations
    Proceedings of the National Academy of Sciences of the United States of America, 2020
    Co-Authors: Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, S G Ovchinnikov, David Baker
    Abstract:

    The Prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced Protein Structure Prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating Structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described Structure-Prediction methods. Although trained entirely on native Proteins, the network consistently assigns higher probability to de novo-designed Proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the “ideality” of a Protein Structure. The method promises to be useful for a broad range of Protein Structure Prediction and design problems.

  • improved Protein Structure Prediction using predicted inter residue orientations
    bioRxiv, 2019
    Co-Authors: Jianyi Yang, David Baker, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, S G Ovchinnikov
    Abstract:

    The Prediction of inter-residue contacts and distances from co-evolutionary data using deep learning has considerably advanced Protein Structure Prediction. Here we build on these advances by developing a deep residual network for predicting inter-residue orientations in addition to distances, and a Rosetta constrained energy minimization protocol for rapidly and accurately generating Structure models guided by these restraints. In benchmark tests on CASP13 and CAMEO derived sets, the method outperforms all previously described Structure Prediction methods. Although trained entirely on native Proteins, the network consistently assigns higher probability to de novo designed Proteins, identifying the key fold determining residues and providing an independent quantitative measure of the "ideality" of a Protein Structure. The method promises to be useful for a broad range of Protein Structure Prediction and design problems.

  • Protein Structure Prediction using rosetta in casp12
    Proteins, 2018
    Co-Authors: S G Ovchinnikov, David E. Kim, Hahnbeom Park, Frank Dimaio, David Baker
    Abstract:

    We describe several notable aspects of our Structure Predictions using Rosetta in CASP12 in the free modeling (FM) and refinement (TR) categories. First, we had previously generated (and published) models for most large Protein families lacking experimentally determined Structures using Rosetta guided by co-evolution based contact Predictions, and for several targets these models proved better starting points for comparative modeling than any known crystal Structure-our model database thus starts to fulfill one of the goals of the original Protein Structure initiative. Second, while our "human" group simply submitted ROBETTA models for most targets, for six targets expert intervention improved Predictions considerably; the largest improvement was for T0886 where we correctly parsed two discontinuous domains guided by predicted contact maps to accurately identify a structural homolog of the same fold. Third, Rosetta all atom refinement followed by MD simulations led to consistent but small improvements when starting models were close to the native Structure, and larger but less consistent improvements when starting models were further away.

  • sampling bottlenecks in de novo Protein Structure Prediction
    Journal of Molecular Biology, 2009
    Co-Authors: Ben Blum, Philip Bradley, David Baker
    Abstract:

    Abstract The primary obstacle to de novo Protein Structure Prediction is conformational sampling: the native state generally has lower free energy than nonnative Structures but is exceedingly difficult to locate. Structure Predictions with atomic level accuracy have been made for small Proteins using the Rosetta Structure Prediction method, but for larger and more complex Proteins, the native state is virtually never sampled, and it has been unclear how much of an increase in computing power would be required to successfully predict the Structures of such Proteins. In this paper, we develop an approach to determining how much computer power is required to accurately predict the Structure of a Protein, based on a reformulation of the conformational search problem as a combinatorial sampling problem in a discrete feature space. We find that conformational sampling for many Proteins is limited by critical “linchpin” features, often the backbone torsion angles of individual residues, which are sampled very rarely in unbiased trajectories and, when constrained, dramatically increase the sampling of the native state. These critical features frequently occur in less regular and likely strained regions of Proteins that contribute to Protein function. In a number of Proteins, the linchpin features are in regions found experimentally to form late in folding, suggesting a correspondence between folding in silico and in reality.

  • multipass membrane Protein Structure Prediction using rosetta
    Proteins, 2005
    Co-Authors: Vladimir Yarovyarovoy, Jack Schonbrun, David Baker
    Abstract:

    We describe the adaptation of the Rosetta de novo Structure Prediction method for Prediction of helical transmembrane Protein Structures. The membrane environment is modeled by embedding the Protein chain into a model membrane represented by parallel planes defining hydrophobic, interface, and polar membrane layers for each energy evaluation. The optimal embedding is determined by maximizing the exposure of surface hydrophobic residues within the membrane and minimizing hydrophobic exposure outside of the membrane. Protein conformations are built up using the Rosetta fragment assembly method and evaluated using a new membrane-specific version of the Rosetta low-resolution energy function in which residue-residue and residue-environment interactions are functions of the membrane layer in addition to amino acid identity, distance, and density. We find that lower energy and more native-like Structures are achieved by sequential addition of helices to a growing chain, which may mimic some aspects of helical Protein biogenesis after translocation, rather than folding the whole chain simultaneously as in the Rosetta soluble Protein Prediction method. In tests on 12 membrane Proteins for which the Structure is known, between 51 and 145 residues were predicted with root-mean-square deviation <4 A from the native Structure.

Jeffrey Skolnick - One of the best experts on this subject based on the ideXlab platform.

  • goap a generalized orientation dependent all atom statistical potential for Protein Structure Prediction
    Biophysical Journal, 2011
    Co-Authors: Hongyi Zhou, Jeffrey Skolnick
    Abstract:

    An accurate scoring function is a key component for successful Protein Structure Prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native Structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled Structures that are close to native (all within ∼4.0 A) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.

  • goap a generalized orientation dependent all atom statistical potential for Protein Structure Prediction
    Biophysical Journal, 2011
    Co-Authors: Hongyi Zhou, Jeffrey Skolnick
    Abstract:

    An accurate scoring function is a key component for successful Protein Structure Prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native Structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled Structures that are close to native (all within ∼4.0 A) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.

  • touchstone ii a new approach to ab initio Protein Structure Prediction
    Biophysical Journal, 2003
    Co-Authors: Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick
    Abstract:

    We have developed a new combined approach for ab initio Protein Structure Prediction. The Protein conformation is described as a lattice chain connecting Cα atoms, with attached Cβ atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of Protein Structures. The combination of these energy terms is optimized through the maximization of correlation for 30 × 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small Proteins (36 ∼ 120 residues) with predicted Structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size Proteins as well as to improve the folding yield of small Proteins, we incorporate into the basic force field side-chain contact Predictions from our threading program PROSPECTOR where homologous Proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test Proteins (36 ∼ 174 residues) with Structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact Prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native Structure than the previously used cluster energy or cluster size, and which can be used in native Structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.

  • touchstone ii a new approach to ab initio Protein Structure Prediction
    Biophysical Journal, 2003
    Co-Authors: Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick
    Abstract:

    We have developed a new combined approach for ab initio Protein Structure Prediction. The Protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of Protein Structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small Proteins (36 approximately 120 residues) with predicted Structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size Proteins as well as to improve the folding yield of small Proteins, we incorporate into the basic force field side-chain contact Predictions from our threading program PROSPECTOR where homologous Proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test Proteins (36 approximately 174 residues) with Structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact Prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native Structure than the previously used cluster energy or cluster size, and which can be used in native Structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.

Adam Liwo - One of the best experts on this subject based on the ideXlab platform.

  • use of restraints from consensus fragments of multiple server models to enhance Protein Structure Prediction capability of the unres force field
    Journal of Chemical Information and Modeling, 2016
    Co-Authors: Magdalena A Mozolewska, Adam Liwo, Jooyoung Lee, Pawel Krupa, Bartlomiej Zaborowski, Keehyoung Joo, Cezary Czaplewski
    Abstract:

    Recently, we developed a new approach to Protein-Structure Prediction, which combines template-based modeling with the physics-based coarse-grained UNited RESidue (UNRES) force field. In this approach, restrained multiplexed replica exchange molecular dynamics simulations with UNRES, with the Cα-distance and virtual-bond-dihedral-angle restraints derived from knowledge-based models are carried out. In this work, we report a test of this approach in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11), in which we used the template-based models from early-stage Predictions by the LEE group CASP11 server (group 038, called “nns”), and further improvement of the method. The quality of the models obtained in CASP11 was better than that resulting from unrestrained UNRES simulations; however, the obtained models were generally worse than the final nns models. Calculations with the final nns models, performed after CASP11, resulted in substantial i...

  • physics based Protein Structure Prediction using a hierarchical protocol based on the unres force field assessment in two blind tests
    Proceedings of the National Academy of Sciences of the United States of America, 2005
    Co-Authors: Stanislaw Oldziej, Adam Liwo, Cezary Czaplewski, M Chinchio, Marian Nanias, Jorge A Vila, Mey Khalili, Yelena A Arnautova, Anna Jagielska, Mariusz Makowski
    Abstract:

    Recent improvements in the Protein-Structure Prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained Structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of Protein-Structure Prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta Proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta Proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole Structures of five Proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta Protein from Thermotoga maritima), we predicted the complete, topologically correct Structure with 7.3-A C(alpha) rmsd. So far this Protein is the largest alpha+beta Protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical Protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based Protein-Structure Prediction.

  • Protein Structure Prediction by global optimization of a potential energy function
    Proceedings of the National Academy of Sciences of the United States of America, 1999
    Co-Authors: Adam Liwo, Jooyoung Lee, Daniel R. Ripoll, Jaroslaw Pillardy, Harold A. Scheraga
    Abstract:

    An approach based exclusively on finding the global minimum of an appropriate potential energy function has been used to predict the unknown Structures of five globular Proteins with sizes ranging from 89 to 140 amino acid residues. Comparison of the computed lowest-energy struc- tures of two of them (HDEA and MarA) with the crystal Structures, released by the Protein Data Bank after the Predictions were made, shows that large fragments (61 resi- dues) of both Proteins were predicted with rms deviations of 4.2 and 6.0 A for the C a atoms, for HDEA and MarA, respectively. This represents 80% and 53% of the observed Structures of HDEA and MarA, respectively. Similar rms deviations were obtained for ;60-residue fragments of the other three Proteins. These results constitute an important step toward the Prediction of Protein Structure based solely on global optimization of a potential energy function for a given amino acid sequence. Prediction of Protein Structure based on sequence informa- tion alone is one of the challenges of contemporary struc- tural biology. There are three classes of approach to the Structure-Prediction problem: sequence-homology methods, methods based on energetic criteria, and threading methods. In the first method, the unknown Structure is constructed based on known structural motifs whose amino acid se- quences are similar to the sequence studied, taking advan- tage of the empirical relationship between sequence and the three-dimensional Structure (1- 6). The methods of the second group (7-9) are based on the thermodynamic hypoth- esis formulated by Anfinsen (10), according to which the native Structure of a Protein corresponds to the global minimum of its free energy under given conditions. Structure Prediction is therefore achieved by a search for the global minimum of an appropriate potential energy function; this is often called the ab initio or de novo approach. Throughout this paper, the ab initio approach to the Protein-folding problem is meant to refer to methods based solely on global optimization of a potential energy function. The threading methods can be placed between these two approaches: they use the energy (or energy-like) functions to distinguish the native Structure from alternative Structures, but the un- known sequence is superposed on structural motifs chosen from a database of known Protein Structures (11). Although sequence homology and threading methods are thus far the most successful tools for Protein-Structure pre- diction, their success depends on the presence of sequence- or structural-homologous Proteins in the databases. On the other hand, global optimization of a potential energy function is based on physical grounds, but thus far has had little success. Protein Structure Prediction based solely on the thermody- namic hypothesis has been considered to be unfeasible (12, 13). The reason for this is both the inaccuracy of the potential energy functions devised to represent the Protein energy landscape and the lack of powerful methods for global opti- mization. Thus, some researchers have introduced variants of ab initio methods that include, as a major part of the procedure, secondary-Structure Predictions and multiple-sequence align- ments that are used as constraints in subsequent conforma- tional searches. These methods (14-17)¶ have achieved an important degree of success in predicting the Structures of a number of Proteins. Here, we describe a method for Protein Structure Prediction that is based solely on global optimization of a potential energy function. Methodology. Reduced representations of Proteins (in which each amino acid residue is represented by one or a few interaction sites) have recently been used with great success

Jianlin Cheng - One of the best experts on this subject based on the ideXlab platform.

  • Protein tertiary Structure modeling driven by deep learning and contact distance Prediction in casp13
    International Conference on Bioinformatics, 2019
    Co-Authors: Jianlin Cheng
    Abstract:

    Ab initio Prediction of Protein Structure from sequence is one of the most challenging and important problems in bioinformatics and computational biology. After a long period of stagnancy, ab initio Protein Structure Prediction is undergoing a revolution driven by inter-residue contact distance Prediction empowered by deep learning. In this talk, I will present the deep learning and contact distance Prediction methods of our MULTICOM Protein Structure Prediction system that was ranked among the top three best methods in the 13th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP13) in 2018 [1]. MULTICOM was able to correctly fold Structures of numerous hard Protein targets from scratch in CASP13, which was an unprecedented progress. The success clearly demonstrates that contact distance Prediction is the key direction to tackle the Protein Structure Prediction challenge and deep learning is the key technology to solve it. However, to completely solve the problem, more advanced deep learning methods are needed to accurately predict inter-residue distances when few homologous sequences are available to calculate residue-residue co-evolution scores, fold Proteins from noisy inter-residue distances, and rank the structural models of hard Protein targets.

  • unicon3d de novo Protein Structure Prediction using united residue conformational search via stepwise probabilistic sampling
    Bioinformatics, 2016
    Co-Authors: Debswapna Bhattacharya, Jianlin Cheng
    Abstract:

    Motivation: Recent experimental studies have suggested that Proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo Protein conformational sampling from continuous space. However, existing computational approaches for de novo Protein Structure Prediction often randomly sample Protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo Protein Structure Prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 Proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for Protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Availability and Implementation: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/. Contact: ude.iruossim@ijgnehc Supplementary information: Supplementary data are available at Bioinformatics online.

  • Protein single model quality assessment by feature based probability density functions
    Scientific Reports, 2016
    Co-Authors: Renzhi Cao, Jianlin Cheng
    Abstract:

    Protein quality assessment (QA) has played an important role in Protein Structure Prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each Protein feature value against the true quality scores (i.e. GDT-TS scores) of Protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our Protein tertiary Structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for Protein single-model quality assessment and is useful for Protein Structure Prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  • Protein single model quality assessment by feature based probability density functions
    arXiv: Quantitative Methods, 2016
    Co-Authors: Renzhi Cao, Jianlin Cheng
    Abstract:

    Protein quality assessment (QA) has played an important role in Protein Structure Prediction. We developed a novel single-model quality assessment method - Qprob. Qprob calculates the absolute error for each Protein feature value against the true quality scores (i.e. GDT-TS scores) of Protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our Protein tertiary Structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for Protein single-model quality assessment and is useful for Protein Structure Prediction. The webserver and software packages of Qprob are available at: this http URL.

  • a large scale conformation sampling and evaluation server for Protein tertiary Structure Prediction and its assessment in casp11
    BMC Bioinformatics, 2015
    Co-Authors: Renzhi Cao, Jianlin Cheng
    Abstract:

    With more and more Protein sequences produced in the genomic era, predicting Protein Structures from sequences becomes very important for elucidating the molecular details and functions of these Proteins for biomedical research. Traditional template-based Protein Structure Prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models. We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of Protein Structure Prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target Protein. In the second step, it used a large number of Protein model quality assessment methods to evaluate and rank the models in the Protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking. The method was implemented as two Protein Structure Prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors. The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/ .