Training Dataset

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Kuochen Chou - One of the best experts on this subject based on the ideXlab platform.

  • ploc_bal mhum predict subcellular localization of human proteins by pseaac and quasi balancing Training Dataset
    Genomics, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Abstract A cell contains numerous protein molecules. One of the fundamental goals in molecular cell biology is to determine their subcellular locations since this information is extremely important to both basic research and drug development. In this paper, we report a novel and very powerful predictor called “pLoc_bal-mHum” for predicting the subcellular localization of human proteins based on their sequence information alone. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the new predictor is remarkably superior to the existing state-of-the-art predictor in identifying the subcellular localization of human proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mHum/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal meuk predict subcellular localization of eukaryotic proteins by general pseaac and quasi balancing Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Background/objective Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mEuk" was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed Dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experimentconfirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLocmEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. Results To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/. Conclusion It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.

  • ploc_bal mvirus predict subcellular localization of multi label virus proteins by chou s general pseaac and ihts treatment to balance Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Background/objective Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mVirus" was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as "multiplex proteins", may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed Dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the Training Dataset, we have developed a new predictor called "pLoc_bal-mVirus" for predicting the subcellular localization of multi-label virus proteins. Results Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose. Conclusion Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.

  • ploc_bal mgpos predict subcellular localization of gram positive bacterial proteins by quasi balancing Training Dataset and pseaac
    Genomics, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Abstract Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mGpos” was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed Dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven Training Dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the Training Dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal manimal predict subcellular localization of animal proteins by balancing Training Dataset and pseaac
    Bioinformatics, 2019
    Co-Authors: Xiang Cheng, Weizhong Lin, Xuan Xiao, Kuochen Chou
    Abstract:

    Motivation A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called 'pLoc-mAnimal' was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called 'multiplex proteins', may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed Dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven Training Dataset will inevitably cause a biased consequence. Results To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability and implementation To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information Supplementary data are available at Bioinformatics online.

Wenhua Chen - One of the best experts on this subject based on the ideXlab platform.

  • dimension reduction aided hyperspectral image classification with a small sized Training Dataset experimental comparisons
    Sensors, 2017
    Co-Authors: Cunjia Liu, Lei Guo, Wenhua Chen
    Abstract:

    Hyperspectral images (HSI) provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification), and the need of a large amount of labelled data for Training. Particularly, it is generally difficult and expensive for experts to acquire sufficient Training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of Training Dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance) and feature extraction (e.g., Principal Component Analysis (PCA), Kernel PCA) are adopted to augment a baseline classification method, Support Vector Machine (SVM). The proposed algorithms are evaluated using a real HSI Dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small Training Dataset, which makes it suitable for real-time applications or when only limited Training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large Datasets but with much less computing time.

Xuan Xiao - One of the best experts on this subject based on the ideXlab platform.

  • ploc_bal mhum predict subcellular localization of human proteins by pseaac and quasi balancing Training Dataset
    Genomics, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Abstract A cell contains numerous protein molecules. One of the fundamental goals in molecular cell biology is to determine their subcellular locations since this information is extremely important to both basic research and drug development. In this paper, we report a novel and very powerful predictor called “pLoc_bal-mHum” for predicting the subcellular localization of human proteins based on their sequence information alone. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the new predictor is remarkably superior to the existing state-of-the-art predictor in identifying the subcellular localization of human proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mHum/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal meuk predict subcellular localization of eukaryotic proteins by general pseaac and quasi balancing Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Background/objective Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mEuk" was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed Dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experimentconfirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLocmEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. Results To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/. Conclusion It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.

  • ploc_bal mvirus predict subcellular localization of multi label virus proteins by chou s general pseaac and ihts treatment to balance Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Background/objective Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mVirus" was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as "multiplex proteins", may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed Dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the Training Dataset, we have developed a new predictor called "pLoc_bal-mVirus" for predicting the subcellular localization of multi-label virus proteins. Results Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose. Conclusion Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.

  • ploc_bal mgpos predict subcellular localization of gram positive bacterial proteins by quasi balancing Training Dataset and pseaac
    Genomics, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Abstract Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mGpos” was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed Dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven Training Dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the Training Dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal manimal predict subcellular localization of animal proteins by balancing Training Dataset and pseaac
    Bioinformatics, 2019
    Co-Authors: Xiang Cheng, Weizhong Lin, Xuan Xiao, Kuochen Chou
    Abstract:

    Motivation A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called 'pLoc-mAnimal' was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called 'multiplex proteins', may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed Dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven Training Dataset will inevitably cause a biased consequence. Results To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability and implementation To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information Supplementary data are available at Bioinformatics online.

Cunjia Liu - One of the best experts on this subject based on the ideXlab platform.

  • dimension reduction aided hyperspectral image classification with a small sized Training Dataset experimental comparisons
    Sensors, 2017
    Co-Authors: Cunjia Liu, Lei Guo, Wenhua Chen
    Abstract:

    Hyperspectral images (HSI) provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification), and the need of a large amount of labelled data for Training. Particularly, it is generally difficult and expensive for experts to acquire sufficient Training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of Training Dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance) and feature extraction (e.g., Principal Component Analysis (PCA), Kernel PCA) are adopted to augment a baseline classification method, Support Vector Machine (SVM). The proposed algorithms are evaluated using a real HSI Dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small Training Dataset, which makes it suitable for real-time applications or when only limited Training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large Datasets but with much less computing time.

Xiang Cheng - One of the best experts on this subject based on the ideXlab platform.

  • ploc_bal mhum predict subcellular localization of human proteins by pseaac and quasi balancing Training Dataset
    Genomics, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Abstract A cell contains numerous protein molecules. One of the fundamental goals in molecular cell biology is to determine their subcellular locations since this information is extremely important to both basic research and drug development. In this paper, we report a novel and very powerful predictor called “pLoc_bal-mHum” for predicting the subcellular localization of human proteins based on their sequence information alone. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the new predictor is remarkably superior to the existing state-of-the-art predictor in identifying the subcellular localization of human proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mHum/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal meuk predict subcellular localization of eukaryotic proteins by general pseaac and quasi balancing Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Kuochen Chou, Xiang Cheng, Xuan Xiao
    Abstract:

    Background/objective Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mEuk" was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed Dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experimentconfirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLocmEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. Results To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/. Conclusion It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.

  • ploc_bal mvirus predict subcellular localization of multi label virus proteins by chou s general pseaac and ihts treatment to balance Training Dataset
    Medicinal Chemistry, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Background/objective Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mVirus" was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as "multiplex proteins", may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed Dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven Training Dataset. Methods Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the Training Dataset, we have developed a new predictor called "pLoc_bal-mVirus" for predicting the subcellular localization of multi-label virus proteins. Results Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose. Conclusion Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.

  • ploc_bal mgpos predict subcellular localization of gram positive bacterial proteins by quasi balancing Training Dataset and pseaac
    Genomics, 2019
    Co-Authors: Xuan Xiao, Xiang Cheng, Genqiang Chen, Qi Mao, Kuochen Chou
    Abstract:

    Abstract Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mGpos” was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed Dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven Training Dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the Training Dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/ , by which users can easily get their desired results without the need to go through the detailed mathematics.

  • ploc_bal manimal predict subcellular localization of animal proteins by balancing Training Dataset and pseaac
    Bioinformatics, 2019
    Co-Authors: Xiang Cheng, Weizhong Lin, Xuan Xiao, Kuochen Chou
    Abstract:

    Motivation A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called 'pLoc-mAnimal' was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called 'multiplex proteins', may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed Dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven Training Dataset will inevitably cause a biased consequence. Results To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the Training Dataset. Cross-validation tests on exactly the same experiment-confirmed Dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability and implementation To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information Supplementary data are available at Bioinformatics online.