Dimension Reduction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 76017 Experts worldwide ranked by ideXlab platform

Haesun Park - One of the best experts on this subject based on the ideXlab platform.

  • Dimension Reduction in text classification with support vector machines
    Journal of Machine Learning Research, 2005
    Co-Authors: Peg Howland, Haesun Park
    Abstract:

    Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the Dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel Dimension Reduction methods to reduce the Dimension of the document vectors dramatically. We also introduce decision functions for the centroid-based classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with several Dimension Reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the Dimension of the input space is significantly reduced.

  • idr qr an incremental Dimension Reduction algorithm via qr decomposition
    IEEE Transactions on Knowledge and Data Engineering, 2005
    Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar
    Abstract:

    Dimension Reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

  • idr qr an incremental Dimension Reduction algorithm via qr decomposition
    Knowledge Discovery and Data Mining, 2004
    Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar
    Abstract:

    Dimension Reduction is critical for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction scheme is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there is little work on designing incremental LDA algorithms. In this paper, we propose an LDA based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification accuracy on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the accuracy achieved by the IDR/QR algorithm is very close to the best possible accuracy achieved by other LDA based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are dynamically inserted.

Andreas Artemiou - One of the best experts on this subject based on the ideXlab platform.

  • Nonlinear Dimension Reduction for conditional quantiles
    Advances in Data Analysis and Classification, 2021
    Co-Authors: Eliana Christou, Annabel Settle, Andreas Artemiou
    Abstract:

    In practice, data often display heteroscedasticity, making quantile regression (QR) a more appropriate methodology. Modeling the data, while maintaining a flexible nonparametric fitting, requires smoothing over a high-Dimensional space which might not be feasible when the number of the predictor variables is large. This problem makes necessary the use of Dimension Reduction techniques for conditional quantiles, which focus on extracting linear combinations of the predictor variables without losing any information about the conditional quantile. However, nonlinear features can achieve greater Dimension Reduction. We, therefore, present the first nonlinear extension of the linear algorithm for estimating the central quantile subspace (CQS) using kernel data. First, we describe the feature CQS within the framework of reproducing kernel Hilbert space, and second, we illustrate its performance through simulation examples and real data applications. Specifically, we emphasize on visualizing various aspects of the data structure using the first two feature extractors, and we highlight the ability to combine the proposed algorithm with classification and regression linear algorithms. The results show that the feature CQS is an effective kernel tool for performing nonlinear Dimension Reduction for conditional quantiles.

  • a study on imbalance support vector machine algorithms for sufficient Dimension Reduction
    Communications in Statistics-theory and Methods, 2017
    Co-Authors: Luke Smallman, Andreas Artemiou
    Abstract:

    ABSTRACTLi et al. (2011) presented the novel idea of using support vector machines (SVMs) to perform sufficient Dimension Reduction. In this work, we investigate the potential improvement in recovering the Dimension Reduction subspace when one changes the SVM algorithm to treat imbalance based on several proposals in the machine learning literature. We find out that in most situations, treating the imbalanced nature of the slices will help improve the estimation. Our results are verified through simulation and real data applications.

  • principal support vector machines for linear and nonlinear sufficient Dimension Reduction
    Annals of Statistics, 2011
    Co-Authors: Andreas Artemiou
    Abstract:

    We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient Dimension Reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, √n-consistent, and asymptotically normal estimator of the sufficient Dimension Reduction space. The method is then generalized to nonlinear sufficient Dimension Reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient Dimension Reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.

Vipin Kumar - One of the best experts on this subject based on the ideXlab platform.

  • idr qr an incremental Dimension Reduction algorithm via qr decomposition
    IEEE Transactions on Knowledge and Data Engineering, 2005
    Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar
    Abstract:

    Dimension Reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

  • idr qr an incremental Dimension Reduction algorithm via qr decomposition
    Knowledge Discovery and Data Mining, 2004
    Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar
    Abstract:

    Dimension Reduction is critical for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction scheme is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there is little work on designing incremental LDA algorithms. In this paper, we propose an LDA based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification accuracy on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the accuracy achieved by the IDR/QR algorithm is very close to the best possible accuracy achieved by other LDA based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are dynamically inserted.

Changshui Zhang - One of the best experts on this subject based on the ideXlab platform.

  • flexible manifold embedding a framework for semi supervised and unsupervised Dimension Reduction
    IEEE Transactions on Image Processing, 2010
    Co-Authors: Dong Xu, Ivor W Tsang, Changshui Zhang
    Abstract:

    We propose a unified manifold learning framework for semi-supervised and unsupervised Dimension Reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised Dimension Reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F0 = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F0. Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised Dimension Reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised Dimension Reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing Dimension Reduction algorithms.

Dennis R Cook - One of the best experts on this subject based on the ideXlab platform.

  • coordinate independent sparse sufficient Dimension Reduction and variable selection
    Annals of Statistics, 2010
    Co-Authors: Xin Chen, Dennis R Cook
    Abstract:

    Sufficient Dimension Reduction (SDR) in regression, which reduces the Dimension by replacing original predictors with a minimal set of their linear combinations without loss of information, is very helpful when the number of predictors is large. The standard SDR methods suffer because the estimated linear combinations usually consist of all original predictors, making it difficult to interpret. In this paper, we propose a unified method— coordinate-independent sparse estimation (CISE)-that can simultaneously achieve sparse sufficient Dimension Reduction and screen out irrelevant and redundant variables efficiently. CISE is subspace oriented in the sense that it incorporates a coordinate-independent penalty term with a broad series of model-based and model-free SDR approaches. This results in a Grassmann manifold optimization problem and a fast algorithm is suggested. Under mild conditions, based on manifold theories and techniques, it can be shown that CISE would perform asymptotically as well as if the true irrelevant predictors were known, which is referred to as the oracle property. Simulation studies and a real-data example demonstrate the effectiveness and efficiency of the proposed approach.

  • Dimension Reduction in regression without matrix inversion
    Biometrika, 2007
    Co-Authors: Dennis R Cook, Francesca Chiaromonte
    Abstract:

    SUMMARY Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient Dimension Reduction provides a promising approach to such problems, by restricting attention to d < n linear combinations of the original p predictors. However, standard methods of sufficient Dimension Reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application.

  • Dimension Reduction in binary response regression
    Journal of the American Statistical Association, 1999
    Co-Authors: Dennis R Cook, Hakbae Lee
    Abstract:

    Abstract The idea of Dimension Reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Focusing on the central subspace, we investigate such “sufficient” Dimension Reduction in regressions with a binary response. Three existing methods—sliced inverse regression, principal Hessian direction, and sliced average variance estimation—and one new method—difference of covariances—are studied for their ability to estimate the central subspace and produce sufficient summary plots. Combining these numerical methods with the graphical methods proposed earlier by Cook leads to a novel paradigm for the analysis of binary response regressions.