Data Classification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 869562 Experts worldwide ranked by ideXlab platform

Francisco Herrera - One of the best experts on this subject based on the ideXlab platform.

  • redundancy and complexity metrics for big Data Classification towards smart Data
    IEEE Access, 2020
    Co-Authors: Jesus Maillo, Isaac Triguero, Francisco Herrera
    Abstract:

    It is recognized the importance of knowing the descriptive properties of a Dataset when tackling a Data science problem. Having information about the redundancy, complexity and density of a problem allows us to make decisions as to which Data preprocessing and machine learning techniques are most suitable. In Classification problems, there are multiple metrics to describe the overlapping of the features between classes, class imbalances or separability, among others. However, these metrics may not scale up well when dealing with big Datasets, or may not simply be sufficiently informative in this context. In this paper, we provide a package of metrics for big Data Classification problems. In particular, we propose two new big Data metrics: Neighborhood Density and Decision Tree Progression, which study density and accuracy progression by discarding half of the samples. In addition, we enable a number of basic metrics to handle big Data. The experimental study carried out in standard big Data Classification problems shows that our metrics can quickly characterize big Datasets. We identified a clear redundancy of information in most Datasets, so that, discarding randomly 75% of the samples does not drastically affect the accuracy of the classifiers used. Thus, the proposed big Data metrics, which are available as a Spark-Package, provide a fast assessment of the shape of a Classification Dataset prior to applying big Data preprocessing, toward smart Data.

  • enabling smart Data noise filtering in big Data Classification
    Information Sciences, 2019
    Co-Authors: Diego Garciagil, Francisco Herrera, Julian Luengo, Salvador Garcia
    Abstract:

    Abstract In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the Data used. Big Data problems, generated by massive growth in the scale of Data observed in recent years, also follow the same dictate. A common problem affecting Data quality is the presence of noise, particularly in Classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of Data. However, in this Big Data era, the massive growth in the scale of the Data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of Data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean Data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data Classification problem.

  • a mapreduce based k nearest neighbor approach for big Data Classification
    Trust Security And Privacy In Computing And Communications, 2015
    Co-Authors: Jesus Maillo, Isaac Triguero, Francisco Herrera
    Abstract:

    The k-Nearest Neighbor classifier is one of the most well known methods in Data mining because of its effectiveness and simplicity. Due to its way of working, the application of this classifier may be restricted to problems with a certain number of examples, especially, when the runtime matters. However, the Classification of large amounts of Data is becoming a necessary task in a great number of real-world applications. This topic is known as big Data Classification, in which standard Data mining techniques normally fail to tackle such volume of Data. In this contribution we propose a MapReduce-based approach for k-Nearest neighbor Classification. This model allows us to simultaneously classify large amounts of unseen cases (test examples) against a big (training) Dataset. To do so, the map phase will determine the k-nearest neighbors in different splits of the Data. Afterwards, the reduce stage will compute the definitive neighbors from the list obtained in the map phase. The designed model allows the k-Nearest neighbor classifier to scale to Datasets of arbitrary size, just by simply adding more computing nodes if necessary. Moreover, this parallel implementation provides the exact Classification rate as the original k-NN model. The conducted experiments, using a Dataset with up to 1 million instances, show the promising scalability capabilities of the proposed approach.

  • a mapreduce approach to address big Data Classification problems based on the fusion of linguistic fuzzy rules
    International Journal of Computational Intelligence Systems, 2015
    Co-Authors: Sara Del Rio, Victoria Lopez, Jose Manuel Benitez, Francisco Herrera
    Abstract:

    AbstractThe big Data term is used to describe the exponential Data growth that has recently occurred and represents an immense challenge for traditional learning techniques. To deal with big Data Classification problems we propose the Chi-FRBCS-BigData algorithm, a linguistic fuzzy rule-based Classification system that uses the MapReduce framework to learn and fuse rule bases. It has been developed in two versions with different fusion processes. An experimental study is carried out and the results obtained show that the proposal is able to handle these problems providing competitive results.

Jose Moreno - One of the best experts on this subject based on the ideXlab platform.

  • robust support vector method for hyperspectral Data Classification and knowledge discovery
    IEEE Transactions on Geoscience and Remote Sensing, 2004
    Co-Authors: Gustau Campsvalls, Luis Gomezchova, Javier Calpemaravilla, Jose David Martinguerrero, Emilio Soriaolivas, L Alonsochorda, Jose Moreno
    Abstract:

    We propose the use of support vector machines (SVMs) for automatic hyperspectral Data Classification and knowledge discovery. In the first stage of the study, we use SVMs for crop Classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of: (1) suitability to working conditions when a feature selection stage is not possible and (2) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SVs) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For Classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labeled for each image. A reduced set of labeled samples is used to train the models, and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity, and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training Data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved.

  • robust support vector method for hyperspectral Data Classification and knowledge discovery
    IEEE Transactions on Geoscience and Remote Sensing, 2004
    Co-Authors: Gustau Campsvalls, Luis Gomezchova, Javier Calpemaravilla, Jose David Martinguerrero, Emilio Soriaolivas, L Alonsochorda, Jose Moreno
    Abstract:

    We propose the use of support vector machines (SVMs) for automatic hyperspectral Data Classification and knowledge discovery. In the first stage of the study, we use SVMs for crop Classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of: (1) suitability to working conditions when a feature selection stage is not possible and (2) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SVs) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For Classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labeled for each image. A reduced set of labeled samples is used to train the models, and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity, and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training Data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved.

Gustau Campsvalls - One of the best experts on this subject based on the ideXlab platform.

  • robust support vector method for hyperspectral Data Classification and knowledge discovery
    IEEE Transactions on Geoscience and Remote Sensing, 2004
    Co-Authors: Gustau Campsvalls, Luis Gomezchova, Javier Calpemaravilla, Jose David Martinguerrero, Emilio Soriaolivas, L Alonsochorda, Jose Moreno
    Abstract:

    We propose the use of support vector machines (SVMs) for automatic hyperspectral Data Classification and knowledge discovery. In the first stage of the study, we use SVMs for crop Classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of: (1) suitability to working conditions when a feature selection stage is not possible and (2) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SVs) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For Classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labeled for each image. A reduced set of labeled samples is used to train the models, and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity, and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training Data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved.

  • robust support vector method for hyperspectral Data Classification and knowledge discovery
    IEEE Transactions on Geoscience and Remote Sensing, 2004
    Co-Authors: Gustau Campsvalls, Luis Gomezchova, Javier Calpemaravilla, Jose David Martinguerrero, Emilio Soriaolivas, L Alonsochorda, Jose Moreno
    Abstract:

    We propose the use of support vector machines (SVMs) for automatic hyperspectral Data Classification and knowledge discovery. In the first stage of the study, we use SVMs for crop Classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of: (1) suitability to working conditions when a feature selection stage is not possible and (2) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SVs) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For Classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labeled for each image. A reduced set of labeled samples is used to train the models, and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity, and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training Data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved.

Stephen Marshall - One of the best experts on this subject based on the ideXlab platform.

  • novel two dimensional singular spectrum analysis for effective feature extraction and Data Classification in hyperspectral imaging
    IEEE Transactions on Geoscience and Remote Sensing, 2015
    Co-Authors: Jaime Zabalza, Jiangbin Zheng, Huimin Zhao, Shutao Li, Stephen Marshall
    Abstract:

    Feature extraction is of high importance for effective Data Classification in hyperspectral imaging (HSI). Considering the high correlation among band images, spectral-domain feature extraction is widely employed. For effective spatial information extraction, a 2-D extension to singular spectrum analysis (SSA), a recent technique for generic Data mining and temporal signal analysis, is proposed. With 2D-SSA applied to HSI, each band image is decomposed into varying trend, oscillations and noise. Using the trend and selected oscillations as features, the reconstructed signal, with noise highly suppressed, becomes more robust and effective for Data Classification. Three publicly available Data sets for HSI remote sensing Data Classification are used in our experiments. Comprehensive results using a support vector machine (SVM) classifier have quantitatively evaluated the efficacy of the proposed approach. Benchmarked with several state-of-the-art methods including 2-D empirical mode decomposition (2D-EMD), it is found that our proposed 2D-SSA approach generates the best results in most cases. Unlike 2D-EMD which requires sequential transforms to obtain detailed decomposition, 2D-SSA extracts all components simultaneously. As a result, the executive time in feature extraction can also be dramatically reduced. The superiority in terms of enhanced discrimination ability from 2D-SSA is further validated when a relatively weak classifier, k-nearest neighbor (k-NN), is used for Data Classification. In addition, the combination of 2D-SSA with 1D-PCA (2D-SSA-PCA) has generated the best results among several other approaches, which has demonstrated the great potential in combining 2D-SSA with other approaches for effective spatial-spectral feature extraction and dimension reduction in HSI.

  • novel two dimensional singular spectrum analysis for effective feature extraction and Data Classification in hyperspectral imaging
    IEEE Transactions on Geoscience and Remote Sensing, 2015
    Co-Authors: Jaime Zabalza, Junwei Han, Jiangbin Zheng, Huimin Zhao, Jinchang Ren, Stephen Marshall
    Abstract:

    Feature extraction is of high importance for effective Data Classification in hyperspectral imaging (HSI). Considering the high correlation among band images, spectral-domain feature extraction is widely employed. For effective spatial information extraction, a 2-D extension to singular spectrum analysis (2D-SSA), which is a recent technique for generic Data mining and temporal signal analysis, is proposed. With 2D-SSA applied to HSI, each band image is decomposed into varying trends, oscillations, and noise. Using the trend and the selected oscillations as features, the reconstructed signal, with noise highly suppressed, becomes more robust and effective for Data Classification. Three publicly available Data sets for HSI remote sensing Data Classification are used in our experiments. Comprehensive results using a support vector machine classifier have quantitatively evaluated the efficacy of the proposed approach. Benchmarked with several state-of-the-art methods including 2-D empirical mode decomposition (2D-EMD), it is found that our proposed 2D-SSA approach generates the best results in most cases. Unlike 2D-EMD that requires sequential transforms to obtain detailed decomposition, 2D-SSA extracts all components simultaneously. As a result, the execution time in feature extraction can be also dramatically reduced. The superiority in terms of enhanced discrimination ability from 2D-SSA is further validated when a relatively weak classifier, i.e., the k-nearest neighbor, is used for Data Classification. In addition, the combination of 2D-SSA with 1-D principal component analysis (2D-SSA-PCA) has generated the best results among several other approaches, demonstrating the great potential in combining 2D-SSA with other approaches for effective spatial-spectral feature extraction and dimension reduction in HSI.

Zhigang Gao - One of the best experts on this subject based on the ideXlab platform.

  • a hybrid feature selection algorithm for gene expression Data Classification
    Neurocomputing, 2017
    Co-Authors: Huijuan Lu, Ke Yan, Qun Jin, Junying Chen, Yu Xue, Zhigang Gao
    Abstract:

    In the DNA microarray research field, the increasing sample size and feature dimension of the gene expression Data prompt the development of an efficient and robust feature selection algorithm for gene expression Data Classification. In this study, we propose a hybrid feature selection algorithm that combines the mutual information maximization (MIM) and the adaptive genetic algorithm (AGA). Experimental results show that the proposing MIMAGA-Selection method significantly reduces the dimension of gene expression Data and removes the redundancies for Classification. The reduced gene expression Dataset provides highest Classification accuracy compared to conventional feature selection algorithms. We also apply four different classifiers to the reduced Dataset to demonstrate the robustness of the proposed MIMAGA-Selection algorithm.