Data Mining System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 80271 Experts worldwide ranked by ideXlab platform

Vladimir Brusic - One of the best experts on this subject based on the ideXlab platform.

  • HPVdb: a Data Mining System for knowledge discovery in human papillomavirus with applications in T cell immunology and vaccinology.
    Database : the journal of biological databases and curation, 2014
    Co-Authors: Guang Lan Zhang, Angelika B. Riemer, Ellis L. Reinherz, Lou Chitkushev, Derin B Keskin, Vladimir Brusic
    Abstract:

    High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological Data on HPV available through publications, technical reports and Databases. These Data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological Data using Data Mining techniques remains a challenging task. To support integration of Data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge Systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from Data Mining of scientific literature and Databases. The Data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this Database as putative targets. HPVdb is a knowledge-based System that integrates curated Data and information with tailored analysis tools to facilitate Data Mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique Data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/.

  • hpvdb a Data Mining System for knowledge discovery in human papillomavirus with applications in t cell immunology and vaccinology
    International Conference on Bioinformatics, 2013
    Co-Authors: Guang Lan Zhang, Angelika B. Riemer, Ellis L. Reinherz, Lou Chitkushev, Derin B Keskin, Vladimir Brusic
    Abstract:

    High-risk human papilloma viruses (HPV) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis, and characterization of these cancers, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2865 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. HPVdb also catalogs 96 verified T cell epitopes and 45 verified HLA ligands. Primary amino acid sequences of HPV antigens were collected and annotated from UniProtKB. T cell epitopes and HLA ligands were collected from Data Mining of scientific literature. The Data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). A set of computational tools for in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, and T cell epitope/HLA ligand visualization, have been integrated in HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this Database as putative targets. HPVdb is a specialized Database that integrates curated Data and information with tailored analysis tools to facilitate Data Mining to aid rational vaccine design by discovery of vaccine targets. To our best knowledge, HPVdb is a unique Data source providing a comprehensive list of antigen peptides in HPV. It is available at http://cvc.dfci.harvard.edu/hpv/ and http://met-hilab.bu.edu/hpvdb/.

  • flavidb a Data Mining System for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology
    Immunome Research, 2011
    Co-Authors: Lars Ronn Olsen, Guang Lan Zhang, Ellis L. Reinherz, Vladimir Brusic
    Abstract:

    Background The flavivirus genus is unusually large, comprising more than 70 species, of which more than half are known human pathogens. It includes a set of clinically relevant infectious agents such as dengue, West Nile, yellow fever, and Japanese encephalitis viruses. Although these pathogens have been studied extensively, safe and efficient vaccines lack for the majority of the flaviviruses. Results We have assembled a Database that combines antigenic Data of flaviviruses, specialized analysis tools, and workflows for automated complex analyses focusing on applications in immunology and vaccinology. FLAVIdB contains 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein. FLAVIdB was assembled by collection, annotation, and integration of Data from GenBank, GenPept, UniProt, IEDB, and PDB. The Data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). Further annotation of selected functionally relevant features was performed by organizing information extracted from the literature. The Database was incorporated into a web-accessible Data Mining System, combining specialized Data analysis tools for integrated analysis of relevant Data categories (protein sequences, macromolecular structures, and immune epitopes). The Data Mining System includes tools for variability and conservation analysis, T-cell epitope prediction, and characterization of neutralizing components of B-cell epitopes. FLAVIdB is accessible at cvc.dfci.harvard.edu/flavi/ Conclusion FLAVIdB represents a new generation of Databases in which Data and tools are integrated into a Data Mining infrastructures specifically designed to aid rational vaccine design by discovery of vaccine targets. Background More than 70 known viral species belong to the flavivirus genus. The flavivirus genus can be divided into three clusters, fourteen clades, and 70 species [1]. The clusters are based on host-vector association: mosquito-borne, tick-borne, and no-vector viruses. The members of flavivirus clades share >69% pairwise nucleotide sequence identity, while members of individual species share >84% identity [1]. More than half of these singlestranded RNA viruses are known human pathogens [2]. The most important human pathogens among flaviviruses are West Nile virus (WNV), dengue virus (DENV), Tickborne encephalitis virus (TBEV), Japanese encephalitis virus encephalitis virus (JEV), and yellow fever virus (YFV). Flaviviruses pose a significant global public health threat since they are responsible for hundreds of

Guang Lan Zhang - One of the best experts on this subject based on the ideXlab platform.

  • HPVdb: a Data Mining System for knowledge discovery in human papillomavirus with applications in T cell immunology and vaccinology.
    Database : the journal of biological databases and curation, 2014
    Co-Authors: Guang Lan Zhang, Angelika B. Riemer, Ellis L. Reinherz, Lou Chitkushev, Derin B Keskin, Vladimir Brusic
    Abstract:

    High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological Data on HPV available through publications, technical reports and Databases. These Data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological Data using Data Mining techniques remains a challenging task. To support integration of Data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge Systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from Data Mining of scientific literature and Databases. The Data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this Database as putative targets. HPVdb is a knowledge-based System that integrates curated Data and information with tailored analysis tools to facilitate Data Mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique Data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/.

  • hpvdb a Data Mining System for knowledge discovery in human papillomavirus with applications in t cell immunology and vaccinology
    International Conference on Bioinformatics, 2013
    Co-Authors: Guang Lan Zhang, Angelika B. Riemer, Ellis L. Reinherz, Lou Chitkushev, Derin B Keskin, Vladimir Brusic
    Abstract:

    High-risk human papilloma viruses (HPV) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis, and characterization of these cancers, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2865 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. HPVdb also catalogs 96 verified T cell epitopes and 45 verified HLA ligands. Primary amino acid sequences of HPV antigens were collected and annotated from UniProtKB. T cell epitopes and HLA ligands were collected from Data Mining of scientific literature. The Data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). A set of computational tools for in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, and T cell epitope/HLA ligand visualization, have been integrated in HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this Database as putative targets. HPVdb is a specialized Database that integrates curated Data and information with tailored analysis tools to facilitate Data Mining to aid rational vaccine design by discovery of vaccine targets. To our best knowledge, HPVdb is a unique Data source providing a comprehensive list of antigen peptides in HPV. It is available at http://cvc.dfci.harvard.edu/hpv/ and http://met-hilab.bu.edu/hpvdb/.

  • flavidb a Data Mining System for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology
    Immunome Research, 2011
    Co-Authors: Lars Ronn Olsen, Guang Lan Zhang, Ellis L. Reinherz, Vladimir Brusic
    Abstract:

    Background The flavivirus genus is unusually large, comprising more than 70 species, of which more than half are known human pathogens. It includes a set of clinically relevant infectious agents such as dengue, West Nile, yellow fever, and Japanese encephalitis viruses. Although these pathogens have been studied extensively, safe and efficient vaccines lack for the majority of the flaviviruses. Results We have assembled a Database that combines antigenic Data of flaviviruses, specialized analysis tools, and workflows for automated complex analyses focusing on applications in immunology and vaccinology. FLAVIdB contains 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein. FLAVIdB was assembled by collection, annotation, and integration of Data from GenBank, GenPept, UniProt, IEDB, and PDB. The Data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). Further annotation of selected functionally relevant features was performed by organizing information extracted from the literature. The Database was incorporated into a web-accessible Data Mining System, combining specialized Data analysis tools for integrated analysis of relevant Data categories (protein sequences, macromolecular structures, and immune epitopes). The Data Mining System includes tools for variability and conservation analysis, T-cell epitope prediction, and characterization of neutralizing components of B-cell epitopes. FLAVIdB is accessible at cvc.dfci.harvard.edu/flavi/ Conclusion FLAVIdB represents a new generation of Databases in which Data and tools are integrated into a Data Mining infrastructures specifically designed to aid rational vaccine design by discovery of vaccine targets. Background More than 70 known viral species belong to the flavivirus genus. The flavivirus genus can be divided into three clusters, fourteen clades, and 70 species [1]. The clusters are based on host-vector association: mosquito-borne, tick-borne, and no-vector viruses. The members of flavivirus clades share >69% pairwise nucleotide sequence identity, while members of individual species share >84% identity [1]. More than half of these singlestranded RNA viruses are known human pathogens [2]. The most important human pathogens among flaviviruses are West Nile virus (WNV), dengue virus (DENV), Tickborne encephalitis virus (TBEV), Japanese encephalitis virus encephalitis virus (JEV), and yellow fever virus (YFV). Flaviviruses pose a significant global public health threat since they are responsible for hundreds of

Padraig Cunningham - One of the best experts on this subject based on the ideXlab platform.

  • an integrated tool for microarray Data clustering and cluster validity assessment
    Bioinformatics, 2005
    Co-Authors: Nadia Bolshakova, Francisco Azuaje, Padraig Cunningham
    Abstract:

    Summary: In this paper we present a Data Mining System, which allows the application of different clustering and cluster validity algorithms for DNA microarray Data. This tool may improve the quality of the Data analysis results, and may support the prediction of the number of relevant clusters in the microarray Datasets. This Systematic evaluation approach may significantly aid genome expression analyses for knowledge discovery applications. The developed software System may be effectively used for clustering and validating not only DNA microarray expression analysis applications but also other biomedical and physical Data with no limitations. Availability: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html Contact: Nadia.Bolshakova@cs.tcd.ie

  • an integrated tool for microarray Data clustering and cluster validity assessment
    ACM Symposium on Applied Computing, 2004
    Co-Authors: Nadia Bolshakova, Francisco Azuaje, Padraig Cunningham
    Abstract:

    In this paper we present a Data Mining System, which allows the application of different clustering and cluster validity algorithms for DNA microarray Data. This tool may improve the quality of the Data analysis results, and may support the prediction of the number of relevant clusters in the microarray Datasets. This Systematic evaluation approach may significantly aid genome expression analyses for knowledge discovery applications. The developed software System may be effectively used for clustering and validating not only DNA microarray expression analysis applications but also other biomedical and physical Data with no limitations. The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html

Yurong Zhong - One of the best experts on this subject based on the ideXlab platform.

  • the analysis of cases based on decision tree
    International Conference on Software Engineering, 2016
    Co-Authors: Yurong Zhong
    Abstract:

    Data Mining is an Data intelligent analysis technology in the late 20th century, it can automatically extract or find useful model knowledge from large amounts of Data in Databases, Data warehouses or other Databases. In this process, the classification of Data is an important research topic in the field of Data Mining. Currently there are different methods for classification, the classification algorithm of decision tree is clear, easy to understand and easy to convert into certain classification rules, so this classification algorithm is widely studied and applied. Based on the background of “Data platform for public petition”, it aims to study how Data Mining System combined with the existing Database, extracting useful information from the mass characteristics hidden in the Data, and provide comprehensive analysis for System managers and decision makers. This paper focus on the study of basic principle of Data Mining and basic algorithms. The classification of the cases, this module was developed based on decision tree algorithm. Based on improved ID3 decision tree algorithm, according to the case information of the library and the client information of the other library, decision tree model can be built, to give certain case an assessment of the comprehensive analysis. This paper presents a simplified algorithm of entropy right based on the ID3 algorithm. The main idea of this algorithm is to combine the principle of Taylor formula with the attribute selection of the ID3 algorithm—entropy solution firstly, to simplify the entropy solution of the ID3 algorithm, to change the standard of attribute selection of the ID3 algorithm, to reduce the calculation complex degree of the algorithm, and to improve the algorithm running efficiency; And then give simplified entropy of every attribute a right N, this N is depend to every number of the attribute's value, to balance uncertainty of each attribute on the Data set. It can make the attribute selection become more reasonable, and avoid compatibility with real attribute.

Xuetong Xie - One of the best experts on this subject based on the ideXlab platform.

  • a problem oriented approach to Data Mining in distributed spatio temporal Database
    International Conference on Conceptual Structures, 2007
    Co-Authors: Zhou Huang, Xia Peng, Bin Chen, Yu Fang, Xuetong Xie
    Abstract:

    Recently, a fast increment of spatio-temporal Data volume has been achieved and more importantly the Data might distribute everywhere. So, there is a need for spatio-temporal Data Mining Systems that are able to support such distributed spatio-temporal query and analysis operations. Distributed spatio-temporal Data Mining technologies were discussed in this paper. After discussing the process of spatio-temporal Data Mining in distributed environment, one actual DSTDMS (Distributed Spatio-Temporal Data Mining System) was designed and then implemented. The System is based on Data model of sequent snapshot and accomplished through spatio-temporal extension on PostgreSQL. Various spatio-temporal analyses and Mining queries could be carried out in the System through simple SQL statements. By using the System, effective Mining of distributed spatio-temporal Data were achieved.