Dimension Reduction

The Experts below are selected from a list of 76017 Experts worldwide ranked by ideXlab platform

Haesun Park - One of the best experts on this subject based on the ideXlab platform.

Dimension Reduction in text classification with support vector machines

Journal of Machine Learning Research, 2005

Co-Authors: Peg Howland, Haesun Park

Abstract:

Support vector machines (SVMs) have been recognized as one of the most successful classification methods for many applications including text classification. Even though the learning ability and computational complexity of training in support vector machines may be independent of the Dimension of the feature space, reducing computational complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification. In this paper, we adopt novel Dimension Reduction methods to reduce the Dimension of the document vectors dramatically. We also introduce decision functions for the centroid-based classification algorithm and support vector classifiers to handle the classification problem where a document may belong to multiple classes. Our substantial experimental results show that with several Dimension Reduction methods that are designed particularly for clustered data, higher efficiency for both training and testing can be achieved without sacrificing prediction accuracy of text classification even when the Dimension of the input space is significantly reduced.

15 days free trial to Access Article
idr qr an incremental Dimension Reduction algorithm via qr decomposition

IEEE Transactions on Knowledge and Data Engineering, 2005

Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar

Abstract:

Dimension Reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

15 days free trial to Access Article
idr qr an incremental Dimension Reduction algorithm via qr decomposition

Knowledge Discovery and Data Mining, 2004

Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar

Abstract:

Dimension Reduction is critical for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction scheme is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there is little work on designing incremental LDA algorithms. In this paper, we propose an LDA based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification accuracy on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the accuracy achieved by the IDR/QR algorithm is very close to the best possible accuracy achieved by other LDA based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are dynamically inserted.

15 days free trial to Access Article

Andreas Artemiou - One of the best experts on this subject based on the ideXlab platform.

Nonlinear Dimension Reduction for conditional quantiles

Advances in Data Analysis and Classification, 2021

Co-Authors: Eliana Christou, Annabel Settle, Andreas Artemiou

Abstract:

In practice, data often display heteroscedasticity, making quantile regression (QR) a more appropriate methodology. Modeling the data, while maintaining a flexible nonparametric fitting, requires smoothing over a high-Dimensional space which might not be feasible when the number of the predictor variables is large. This problem makes necessary the use of Dimension Reduction techniques for conditional quantiles, which focus on extracting linear combinations of the predictor variables without losing any information about the conditional quantile. However, nonlinear features can achieve greater Dimension Reduction. We, therefore, present the first nonlinear extension of the linear algorithm for estimating the central quantile subspace (CQS) using kernel data. First, we describe the feature CQS within the framework of reproducing kernel Hilbert space, and second, we illustrate its performance through simulation examples and real data applications. Specifically, we emphasize on visualizing various aspects of the data structure using the first two feature extractors, and we highlight the ability to combine the proposed algorithm with classification and regression linear algorithms. The results show that the feature CQS is an effective kernel tool for performing nonlinear Dimension Reduction for conditional quantiles.

15 days free trial to Access Article
a study on imbalance support vector machine algorithms for sufficient Dimension Reduction

Communications in Statistics-theory and Methods, 2017

Co-Authors: Luke Smallman, Andreas Artemiou

Abstract:

ABSTRACTLi et al. (2011) presented the novel idea of using support vector machines (SVMs) to perform sufficient Dimension Reduction. In this work, we investigate the potential improvement in recovering the Dimension Reduction subspace when one changes the SVM algorithm to treat imbalance based on several proposals in the machine learning literature. We find out that in most situations, treating the imbalanced nature of the slices will help improve the estimation. Our results are verified through simulation and real data applications.

15 days free trial to Access Article
principal support vector machines for linear and nonlinear sufficient Dimension Reduction

Annals of Statistics, 2011

Co-Authors: Andreas Artemiou

Abstract:

We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient Dimension Reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, √n-consistent, and asymptotically normal estimator of the sufficient Dimension Reduction space. The method is then generalized to nonlinear sufficient Dimension Reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient Dimension Reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.

15 days free trial to Access Article

Vipin Kumar - One of the best experts on this subject based on the ideXlab platform.

idr qr an incremental Dimension Reduction algorithm via qr decomposition

IEEE Transactions on Knowledge and Data Engineering, 2005

Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar

Abstract:

Dimension Reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

15 days free trial to Access Article
idr qr an incremental Dimension Reduction algorithm via qr decomposition

Knowledge Discovery and Data Mining, 2004

Co-Authors: Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar

Abstract:

Dimension Reduction is critical for many database and data mining applications, such as efficient storage and retrieval of high-Dimensional data. In the literature, a well-known Dimension Reduction scheme is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there is little work on designing incremental LDA algorithms. In this paper, we propose an LDA based incremental Dimension Reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification accuracy on the reduced Dimensional space. Our experiments on several real-world data sets reveal that the accuracy achieved by the IDR/QR algorithm is very close to the best possible accuracy achieved by other LDA based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are dynamically inserted.

15 days free trial to Access Article

Changshui Zhang - One of the best experts on this subject based on the ideXlab platform.

flexible manifold embedding a framework for semi supervised and unsupervised Dimension Reduction

IEEE Transactions on Image Processing, 2010

Co-Authors: Dong Xu, Ivor W Tsang, Changshui Zhang

Abstract:

We propose a unified manifold learning framework for semi-supervised and unsupervised Dimension Reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised Dimension Reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F0 = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F0. Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised Dimension Reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised Dimension Reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing Dimension Reduction algorithms.

15 days free trial to Access Article

Dennis R Cook - One of the best experts on this subject based on the ideXlab platform.

coordinate independent sparse sufficient Dimension Reduction and variable selection

Annals of Statistics, 2010

Co-Authors: Xin Chen, Dennis R Cook

Abstract:

Sufficient Dimension Reduction (SDR) in regression, which reduces the Dimension by replacing original predictors with a minimal set of their linear combinations without loss of information, is very helpful when the number of predictors is large. The standard SDR methods suffer because the estimated linear combinations usually consist of all original predictors, making it difficult to interpret. In this paper, we propose a unified method— coordinate-independent sparse estimation (CISE)-that can simultaneously achieve sparse sufficient Dimension Reduction and screen out irrelevant and redundant variables efficiently. CISE is subspace oriented in the sense that it incorporates a coordinate-independent penalty term with a broad series of model-based and model-free SDR approaches. This results in a Grassmann manifold optimization problem and a fast algorithm is suggested. Under mild conditions, based on manifold theories and techniques, it can be shown that CISE would perform asymptotically as well as if the true irrelevant predictors were known, which is referred to as the oracle property. Simulation studies and a real-data example demonstrate the effectiveness and efficiency of the proposed approach.

15 days free trial to Access Article
Dimension Reduction in regression without matrix inversion

Biometrika, 2007

Co-Authors: Dennis R Cook, Francesca Chiaromonte

Abstract:

SUMMARY Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient Dimension Reduction provides a promising approach to such problems, by restricting attention to d < n linear combinations of the original p predictors. However, standard methods of sufficient Dimension Reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application.

15 days free trial to Access Article
Dimension Reduction in binary response regression

Journal of the American Statistical Association, 1999

Co-Authors: Dennis R Cook, Hakbae Lee

Abstract:

Abstract The idea of Dimension Reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Focusing on the central subspace, we investigate such “sufficient” Dimension Reduction in regressions with a binary response. Three existing methods—sliced inverse regression, principal Hessian direction, and sliced average variance estimation—and one new method—difference of covariances—are studied for their ability to estimate the central subspace and produce sufficient summary plots. Combining these numerical methods with the graphical methods proposed earlier by Cook leads to a novel paradigm for the analysis of binary response regressions.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Haesun Park - One of the best experts on this subject based on the ideXlab platform.

Dimension Reduction in text classification with support vector machines

idr qr an incremental Dimension Reduction algorithm via qr decomposition

idr qr an incremental Dimension Reduction algorithm via qr decomposition

Andreas Artemiou - One of the best experts on this subject based on the ideXlab platform.

Nonlinear Dimension Reduction for conditional quantiles

a study on imbalance support vector machine algorithms for sufficient Dimension Reduction

principal support vector machines for linear and nonlinear sufficient Dimension Reduction

Vipin Kumar - One of the best experts on this subject based on the ideXlab platform.

idr qr an incremental Dimension Reduction algorithm via qr decomposition

idr qr an incremental Dimension Reduction algorithm via qr decomposition

Changshui Zhang - One of the best experts on this subject based on the ideXlab platform.

flexible manifold embedding a framework for semi supervised and unsupervised Dimension Reduction

Dennis R Cook - One of the best experts on this subject based on the ideXlab platform.

coordinate independent sparse sufficient Dimension Reduction and variable selection

Dimension Reduction in regression without matrix inversion

Dimension Reduction in binary response regression

Dimension Reduction

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Haesun Park - One of the best experts on this subject based on the ideXlab platform.

Andreas Artemiou - One of the best experts on this subject based on the ideXlab platform.

Vipin Kumar - One of the best experts on this subject based on the ideXlab platform.

Changshui Zhang - One of the best experts on this subject based on the ideXlab platform.

Dennis R Cook - One of the best experts on this subject based on the ideXlab platform.

Related terms