Subspace

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 251652 Experts worldwide ranked by ideXlab platform

René Vidal - One of the best experts on this subject based on the ideXlab platform.

  • Algebraic Clustering of Affine Subspaces
    IEEE transactions on pattern analysis and machine intelligence, 2017
    Co-Authors: Manolis C. Tsakiris, René Vidal
    Abstract:

    Subspace clustering is an important problem in machine learning with many applications in computer vision and pattern recognition. Prior work has studied this problem using algebraic, iterative, statistical, low-rank and sparse representation techniques. While these methods have been applied to both linear and affine Subspaces, theoretical results have only been established in the case of linear Subspaces. For example, algebraic Subspace clustering (ASC) is guaranteed to provide the correct clustering when the data points are in general position and the union of Subspaces is transversal . In this paper we study in a rigorous fashion the properties of ASC in the case of affine Subspaces. Using notions from algebraic geometry, we prove that the homogenization trick , which embeds points in a union of affine Subspaces into points in a union of linear Subspaces, preserves the general position of the points and the transversality of the union of Subspaces in the embedded space, thus establishing the correctness of ASC for affine Subspaces.

  • Filtrated Algebraic Subspace Clustering
    SIAM Journal on Imaging Sciences, 2017
    Co-Authors: Manolis C. Tsakiris, René Vidal
    Abstract:

    Subspace clustering is the problem of clustering data that lie close to a union of linear Subspaces. Existing algebraic Subspace clustering methods are based on fitting the data with an algebraic variety and decomposing this variety into its constituent Subspaces. Such methods are well suited to the case of a known number of Subspaces of known and equal dimensions, where a single polynomial vanishing in the variety is sufficient to identify the Subspaces. While Subspaces of unknown and arbitrary dimensions can be handled using multiple vanishing polynomials, current approaches are not robust to corrupted data due to the difficulty of estimating the number of polynomials. As a consequence, the current practice is to use a single polynomial to fit the data with a union of hyperplanes containing the union of Subspaces, an approach that works well only when the dimensions of the Subspaces are high enough. In this paper, we propose a new algebraic Subspace clustering algorithm, which can identify the Subspace ...

  • scalable sparse Subspace clustering by orthogonal matching pursuit
    Computer Vision and Pattern Recognition, 2016
    Co-Authors: Daniel P. Robinson, René Vidal
    Abstract:

    Subspace clustering methods based on l1, l2 or nuclear norm regularization have become very popular due to their simplicity, theoretical guarantees and empirical success. However, the choice of the regularizer can greatly impact both theory and practice. For instance, l1 regularization is guaranteed to give a Subspace-preserving affinity (i.e., there are no connections between points from different Subspaces) under broad conditions (e.g., arbitrary Subspaces and corrupted data). However, it requires solving a large scale convex optimization problem. On the other hand, l2 and nuclear norm regularization provide efficient closed form solutions, but require very strong assumptions to guarantee a Subspace-preserving affinity, e.g., independent Subspaces and uncorrupted data. In this paper we study a Subspace clustering method based on orthogonal matching pursuit. We show that the method is both computationally efficient and guaranteed to give a Subspace-preserving affinity under broad conditions. Experiments on synthetic data verify our theoretical analysis, and applications in handwritten digit and face clustering show that our approach achieves the best trade off between accuracy and efficiency. Moreover, our approach is the first one to handle 100,000 data points.

  • Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016
    Co-Authors: Chong You, Daniel P. Robinson, Chun-guang Li, René Vidal
    Abstract:

    State-of-the-art Subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with $\ell_1$, $\ell_2$ or nuclear norms. $\ell_1$ regularization is guaranteed to give a Subspace-preserving affinity (i.e., there are no connections between points from different Subspaces) under broad theoretical conditions, but the clusters may not be connected. $\ell_2$ and nuclear norm regularization often improve connectivity, but give a Subspace-preserving affinity only for independent Subspaces. Mixed $\ell_1$, $\ell_2$ and nuclear norm regularizations offer a balance between the Subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the $\ell_1$ and $\ell_2$ norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to $\ell_2$ regularization) and Subspace-preserving (due to $\ell_1$ regularization) properties for elastic net Subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.

  • scalable sparse Subspace clustering by orthogonal matching pursuit
    arXiv: Computer Vision and Pattern Recognition, 2015
    Co-Authors: Daniel P. Robinson, René Vidal
    Abstract:

    Subspace clustering methods based on $\ell_1$, $\ell_2$ or nuclear norm regularization have become very popular due to their simplicity, theoretical guarantees and empirical success. However, the choice of the regularizer can greatly impact both theory and practice. For instance, $\ell_1$ regularization is guaranteed to give a Subspace-preserving affinity (i.e., there are no connections between points from different Subspaces) under broad conditions (e.g., arbitrary Subspaces and corrupted data). However, it requires solving a large scale convex optimization problem. On the other hand, $\ell_2$ and nuclear norm regularization provide efficient closed form solutions, but require very strong assumptions to guarantee a Subspace-preserving affinity, e.g., independent Subspaces and uncorrupted data. In this paper we study a Subspace clustering method based on orthogonal matching pursuit. We show that the method is both computationally efficient and guaranteed to give a Subspace-preserving affinity under broad conditions. Experiments on synthetic data verify our theoretical analysis, and applications in handwritten digit and face clustering show that our approach achieves the best trade off between accuracy and efficiency.

Han-wei Shen - One of the best experts on this subject based on the ideXlab platform.

  • High-dimensional data analysis with Subspace comparison using matrix visualization
    Information Visualization, 2017
    Co-Authors: Junpeng Wang, Han-wei Shen
    Abstract:

    Due to the intricate relationship between different dimensions of high-dimensional data, Subspace analysis is often conducted to decompose dimensions and give prominence to certain subsets of dimensions, i.e. Subspaces. Exploring and comparing Subspaces are important to reveal the underlying features of Subspaces, as well as to portray the characteristics of individual dimensions. To date, most of the existing high-dimensional data exploration and analysis approaches rely on dimensionality reduction algorithms (e.g. principal component analysis and multi-dimensional scaling) to project high-dimensional data, or their Subspaces, to two-dimensional space and employ scatterplots for visualization. However, the dimensionality reduction algorithms are sometimes difficult to fine-tune and scatterplots are not effective for comparative visualization, making Subspace comparison hard to perform. In this article, we aggregate high-dimensional data or their Subspaces by computing pair-wise distances between all data...

Klemens Böhm - One of the best experts on this subject based on the ideXlab platform.

  • Dimension-based Subspace search for outlier detection
    International Journal of Data Science and Analytics, 2019
    Co-Authors: Holger Trittenbach, Klemens Böhm
    Abstract:

    Scientific data often are high dimensional. In such data, finding outliers are challenging because they often are hidden in Subspaces, i.e., lower-dimensional projections of the data. With recent approaches to outlier mining, the actual detection of outliers is decoupled from the search for Subspaces likely to contain outliers. However, finding such sets of Subspaces that contain most or even all outliers of the given data set remains an open problem. While previous proposals use per-Subspace measures such as correlation in order to quantify the quality of Subspaces, we explicitly take the relationship between Subspaces into account and propose a dimension-based measure of that quality. Based on it, we formalize the notion of an optimal set of Subspaces and propose the Greedy Maximum Deviation heuristic to approximate this set. Experiments on comprehensive benchmark data show that our concept is more effective in determining the relevant set of Subspaces than approaches which use per-Subspace measures.

  • hics high contrast Subspaces for density based outlier ranking
    International Conference on Data Engineering, 2012
    Co-Authors: Fabian Keller, Emmanuel Muller, Klemens Böhm
    Abstract:

    Outlier mining is a major task in data analysis. Outliers are objects that highly deviate from regular objects in their local neighborhood. Density-based outlier ranking methods score each object based on its degree of deviation. In many applications, these ranking methods degenerate to random listings due to low contrast between outliers and regular objects. Outliers do not show up in the scattered full space, they are hidden in multiple high contrast Subspace projections of the data. Measuring the contrast of such Subspaces for outlier rankings is an open research challenge. In this work, we propose a novel Subspace search method that selects high contrast Subspaces for density-based outlier ranking. It is designed as pre-processing step to outlier ranking algorithms. It searches for high contrast Subspaces with a significant amount of conditional dependence among the Subspace dimensions. With our approach, we propose a first measure for the contrast of Subspaces. Thus, we enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that our approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art Subspace search techniques and provides enhanced quality for outlier ranking.

Shuicheng Yan - One of the best experts on this subject based on the ideXlab platform.

  • correlation adaptive Subspace segmentation by trace lasso
    arXiv: Computer Vision and Pattern Recognition, 2015
    Co-Authors: Jiashi Feng, Zhouchen Lin, Shuicheng Yan
    Abstract:

    This paper studies the Subspace segmentation problem. Given a set of data points drawn from a union of Subspaces, the goal is to partition them into their underlying Subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity matrix which is close to block diagonal, with nonzero entries corresponding to the data point pairs from the same Subspace. In this work, we argue that both sparsity and the grouping effect are important for Subspace segmentation. A sparse affinity matrix tends to be block diagonal, with less connections between data points from different Subspaces. The grouping effect ensures that the highly corrected data which are usually from the same Subspace can be grouped together. Sparse Subspace Clustering (SSC), by using $\ell^1$-minimization, encourages sparsity for data selection, but it lacks of the grouping effect. On the contrary, Low-Rank Representation (LRR), by rank minimization, and Least Squares Regression (LSR), by $\ell^2$-regularization, exhibit strong grouping effect, but they are short in subset selection. Thus the obtained affinity matrix is usually very sparse by SSC, yet very dense by LRR and LSR. In this work, we propose the Correlation Adaptive Subspace Segmentation (CASS) method by using trace Lasso. CASS is a data correlation dependent method which simultaneously performs automatic data selection and groups correlated data together. It can be regarded as a method which adaptively balances SSC and LSR. Both theoretical and experimental results show the effectiveness of CASS.

  • correlation adaptive Subspace segmentation by trace lasso
    International Conference on Computer Vision, 2013
    Co-Authors: Jiashi Feng, Zhouchen Lin, Shuicheng Yan
    Abstract:

    This paper studies the Subspace segmentation problem. Given a set of data points drawn from a union of Subspaces, the goal is to partition them into their underlying Subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity matrix which is close to block diagonal, with nonzero entries corresponding to the data point pairs from the same Subspace. In this work, we argue that both sparsity and the grouping effect are important for Subspace segmentation. A sparse affinity matrix tends to be block diagonal, with less connections between data points from different Subspaces. The grouping effect ensures that the highly corrected data which are usually from the same Subspace can be grouped together. Sparse Subspace Clustering (SSC), by using l1-minimization, encourages sparsity for data selection, but it lacks of the grouping effect. On the contrary, Low-Rank Representation (LRR), by rank minimization, and Least Squares Regression (LSR), by l2-regularization, exhibit strong grouping effect, but they are short in subset selection. Thus the obtained affinity matrix is usually very sparse by SSC, yet very dense by LRR and LSR. In this work, we propose the Correlation Adaptive Subspace Segmentation (CASS) method by using trace Lasso. CASS is a data correlation dependent method which simultaneously performs automatic data selection and groups correlated data together. It can be regarded as a method which adaptively balances SSC and LSR. Both theoretical and experimental results show the effectiveness of CASS.

  • robust recovery of Subspace structures by low rank representation
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
    Co-Authors: Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu
    Abstract:

    In this paper, we address the Subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple Subspaces, our goal is to cluster the samples into their respective Subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the Subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true Subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the Subspace membership is provably determined by the row space, these further imply that LRR can perform robust Subspace clustering and error correction in an efficient and effective way.

Junpeng Wang - One of the best experts on this subject based on the ideXlab platform.

  • High-dimensional data analysis with Subspace comparison using matrix visualization
    Information Visualization, 2017
    Co-Authors: Junpeng Wang, Han-wei Shen
    Abstract:

    Due to the intricate relationship between different dimensions of high-dimensional data, Subspace analysis is often conducted to decompose dimensions and give prominence to certain subsets of dimensions, i.e. Subspaces. Exploring and comparing Subspaces are important to reveal the underlying features of Subspaces, as well as to portray the characteristics of individual dimensions. To date, most of the existing high-dimensional data exploration and analysis approaches rely on dimensionality reduction algorithms (e.g. principal component analysis and multi-dimensional scaling) to project high-dimensional data, or their Subspaces, to two-dimensional space and employ scatterplots for visualization. However, the dimensionality reduction algorithms are sometimes difficult to fine-tune and scatterplots are not effective for comparative visualization, making Subspace comparison hard to perform. In this article, we aggregate high-dimensional data or their Subspaces by computing pair-wise distances between all data...