spectral clustering

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 26778 Experts worldwide ranked by ideXlab platform

Maurizio Filippone - One of the best experts on this subject based on the ideXlab platform.

  • IJCNN - Mini-batch spectral clustering
    2017 International Joint Conference on Neural Networks (IJCNN), 2017
    Co-Authors: Yufei Han, Maurizio Filippone
    Abstract:

    The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering, where the spectrum of the Laplacian is recovered following a constrained optimization problem that we solve using adaptive mini-batch-based stochastic gradient optimization on Stiefel manifolds. Crucially, the proposed approach is formulated so that the memory footprint of the algorithm is low, the cost of each iteration is linear in the number of samples, and convergence to critical points of the objective function is guaranteed. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

  • Mini-Batch spectral clustering
    arXiv: Machine Learning, 2016
    Co-Authors: Yufei Han, Maurizio Filippone
    Abstract:

    The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

Guocan Feng - One of the best experts on this subject based on the ideXlab platform.

  • spectral clustering: A semi-supervised approach
    Neurocomputing, 2012
    Co-Authors: Weifu Chen, Guocan Feng
    Abstract:

    Recently, graph-based spectral clustering algorithms have been developing rapidly, which are proposed as discrete combinatorial optimization problems and approximately solved by relaxing them into tractable eigenvalue decomposition problems. In this paper, we first review the current existing spectral clustering algorithms in a unified-framework way and give a straightforward explanation about spectral clustering. We also present a novel model for generalizing the unsupervised spectral clustering to semi-supervised spectral clustering. Under this model, prior information given by some instance-level constraints can be generalized to space-level constraints. We find that (undirected) graph built on the enlarged prior information is more meaningful, hence the boundaries of the clusters are more correct. Experimental results based on toy data, real-world data and image segmentation demonstrate the advantages of the proposed model.

  • spectral clustering with discriminant cuts
    Knowledge-Based Systems, 2012
    Co-Authors: Weifu Chen, Guocan Feng
    Abstract:

    Recently, many k-way spectral clustering algorithms have been proposed, satisfying one or both of the following requirements: between-cluster similarities are minimized and within-cluster similarities are maximized. In this paper, a novel graph-based spectral clustering algorithm called discriminant cut (Dcut) is proposed, which first builds the affinity matrix of a weighted graph and normalizes it with the corresponding regularized Laplacian matrix, then partitions the vertices into k parts. Dcut has several advantages. First, it is derived from graph partition and has a straightforward geometrical explanation. Second, it emphasizes the above requirements simultaneously. Besides, it is computationally feasible because the NP-hard intractable graph cut problem can be relaxed into a mild eigenvalue decomposition problem. Toy-data and real-data experimental results show that Dcut is pronounced comparing with other spectral clustering methods.

Michael I. Jordan - One of the best experts on this subject based on the ideXlab platform.

  • AISTATS - Dimensionality Reduction for spectral clustering
    2011
    Co-Authors: Donglin Niu, Michael I. Jordan
    Abstract:

    spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional. We optimize this functional over both the projection and the spectral embedding. Experiments on simulated and real data show that this approach yields significant improvements in the performance of spectral clustering.

  • fast approximate spectral clustering
    Knowledge Discovery and Data Mining, 2009
    Co-Authors: Donghui Yan, Ling Huang, Michael I. Jordan
    Abstract:

    spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.

  • KDD - Fast approximate spectral clustering
    Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09, 2009
    Co-Authors: Donghui Yan, Ling Huang, Michael I. Jordan
    Abstract:

    spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.

  • spectral clustering with perturbed data
    Neural Information Processing Systems, 2008
    Co-Authors: Ling Huang, Donghui Yan, Nina Taft, Michael I. Jordan
    Abstract:

    spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance.

  • Multiway spectral clustering: A Margin-based Perspective
    Statistical Science, 2008
    Co-Authors: Zhihua Zhang, Michael I. Jordan
    Abstract:

    spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is “relaxed” into a tractable eigenvector problem, and in which the relaxed solution is subsequently “rounded” into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspects of spectral clustering, providing a unified analysis of existing algorithms and guiding the design of new algorithms. We also present connections between spectral clustering and several other topics in statistics, specifically minimum-variance clustering, Procrustes analysis and Gaussian intrinsic autoregression.

Yufei Han - One of the best experts on this subject based on the ideXlab platform.

  • IJCNN - Mini-batch spectral clustering
    2017 International Joint Conference on Neural Networks (IJCNN), 2017
    Co-Authors: Yufei Han, Maurizio Filippone
    Abstract:

    The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering, where the spectrum of the Laplacian is recovered following a constrained optimization problem that we solve using adaptive mini-batch-based stochastic gradient optimization on Stiefel manifolds. Crucially, the proposed approach is formulated so that the memory footprint of the algorithm is low, the cost of each iteration is linear in the number of samples, and convergence to critical points of the objective function is guaranteed. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

  • Mini-Batch spectral clustering
    arXiv: Machine Learning, 2016
    Co-Authors: Yufei Han, Maurizio Filippone
    Abstract:

    The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

Frbmi Jordan - One of the best experts on this subject based on the ideXlab platform.

  • Learning spectral clustering
    Advances in Neural Information Processing Systems 16 (NIPS), 2004
    Co-Authors: Frbmi Jordan
    Abstract:

    spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors.