Hierarchical Clustering

The Experts below are selected from a list of 97266 Experts worldwide ranked by ideXlab platform

Sanjoy Dasgupta - One of the best experts on this subject based on the ideXlab platform.

ICML - Interactive Bayesian Hierarchical Clustering

2016

Co-Authors: Sharad Vikram, Sanjoy Dasgupta

Abstract:

Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs. To address this, several methods incorporate constraints obtained from users into Clustering algorithms, but unfortunately do not apply to Hierarchical Clustering. We design an interactive Bayesian algorithm that incorporates user interaction into Hierarchical Clustering while still utilizing the geometry of the data by sampling a constrained posterior distribution over hierarchies. We also suggest several ways to intelligently query a user. The algorithm, along with the querying schemes, shows promising results on real data.

15 days free trial to Access Article
Interactive Bayesian Hierarchical Clustering

arXiv: Learning, 2016

Co-Authors: Sharad Vikram, Sanjoy Dasgupta

Abstract:

Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs. To address this, several methods incorporate constraints obtained from users into Clustering algorithms, but unfortunately do not apply to Hierarchical Clustering. We design an interactive Bayesian algorithm that incorporates user interaction into Hierarchical Clustering while still utilizing the geometry of the data by sampling a constrained posterior distribution over hierarchies. We also suggest several ways to intelligently query a user. The algorithm, along with the querying schemes, shows promising results on real data.

15 days free trial to Access Article
Performance guarantees for Hierarchical Clustering

Journal of Computer and System Sciences, 2005

Co-Authors: Sanjoy Dasgupta, Philip M. Long

Abstract:

We show that for any data set in any metric space, it is possible to construct a Hierarchical Clustering with the guarantee that for every k, the induced k-Clustering has cost at most eight times that of the optimal k-Clustering. Here the cost of a Clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to popular agglomerative heuristics for Hierarchical Clustering, and we show that these heuristics have unbounded approximation factors.

15 days free trial to Access Article
COLT - Performance Guarantees for Hierarchical Clustering

Lecture Notes in Computer Science, 2002

Co-Authors: Sanjoy Dasgupta

Abstract:

We show that for any data set in any metric space, it is possible to construct a Hierarchical Clustering with the guarantee that for every k, the induced k-Clustering has cost at most eight times that of the optimal k-Clustering. Here the cost of a Clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to common heuristics for Hierarchical Clustering, and we show that these heuristics have poorer approximation factors.

15 days free trial to Access Article

Ruben H. Zamar - One of the best experts on this subject based on the ideXlab platform.

Multi-rank Sparse Hierarchical Clustering

arXiv: Machine Learning, 2014

Co-Authors: Hongyang Zhang, Ruben H. Zamar

Abstract:

There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fields. Hierarchical Clustering is a widely used Clustering tool. In Hierarchical Clustering, large and flat data sets may allow for a better coverage of Clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-Clustering features) that may hide the underlying clusters. Witten and Tibshirani (2010) proposed a sparse Hierarchical Clustering framework to cluster the observations using an adaptively chosen subset of the features, however, we show that this framework has some limitations when the data sets contain Clustering features with complex structure. In this paper, we propose the Multi-rank sparse Hierarchical Clustering (MrSHC). We show that, using simulation studies and real data examples, MrSHC produces superior feature selection and Clustering performance comparing to the classical (of-the-shelf) Hierarchical Clustering and the existing sparse Hierarchical Clustering framework.

15 days free trial to Access Article
A natural framework for sparse Hierarchical Clustering.

arXiv: Machine Learning, 2014

Co-Authors: Hongyang Zhang, Ruben H. Zamar

Abstract:

There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fi?elds. Hierarchical Clustering is a widely used Clustering tool. In Hierarchical Clustering, large and flat data sets may allow for a better coverage of Clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-Clustering features) that may hide the underlying clusters. Witten and Tibshirani (2010) proposed a sparse Hierarchical Clustering framework to cluster the observations using an adaptively chosen subset of the features, however, we show that this framework has some limitations when the data sets contain Clustering features with complex structure. In this paper, another sparse Hierarchical Clustering (SHC) framework is proposed. We show that, using simulation studies and real data examples, the proposed framework produces superior feature selection and Clustering performance comparing to the classical (of-the-shelf) Hierarchical Clustering and the existing sparse Hierarchical Clustering framework.

15 days free trial to Access Article

Hongyang Zhang - One of the best experts on this subject based on the ideXlab platform.

Multi-rank Sparse Hierarchical Clustering

arXiv: Machine Learning, 2014

Co-Authors: Hongyang Zhang, Ruben H. Zamar

Abstract:

There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fields. Hierarchical Clustering is a widely used Clustering tool. In Hierarchical Clustering, large and flat data sets may allow for a better coverage of Clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-Clustering features) that may hide the underlying clusters. Witten and Tibshirani (2010) proposed a sparse Hierarchical Clustering framework to cluster the observations using an adaptively chosen subset of the features, however, we show that this framework has some limitations when the data sets contain Clustering features with complex structure. In this paper, we propose the Multi-rank sparse Hierarchical Clustering (MrSHC). We show that, using simulation studies and real data examples, MrSHC produces superior feature selection and Clustering performance comparing to the classical (of-the-shelf) Hierarchical Clustering and the existing sparse Hierarchical Clustering framework.

15 days free trial to Access Article
A natural framework for sparse Hierarchical Clustering.

arXiv: Machine Learning, 2014

Co-Authors: Hongyang Zhang, Ruben H. Zamar

Abstract:

There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fi?elds. Hierarchical Clustering is a widely used Clustering tool. In Hierarchical Clustering, large and flat data sets may allow for a better coverage of Clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-Clustering features) that may hide the underlying clusters. Witten and Tibshirani (2010) proposed a sparse Hierarchical Clustering framework to cluster the observations using an adaptively chosen subset of the features, however, we show that this framework has some limitations when the data sets contain Clustering features with complex structure. In this paper, another sparse Hierarchical Clustering (SHC) framework is proposed. We show that, using simulation studies and real data examples, the proposed framework produces superior feature selection and Clustering performance comparing to the classical (of-the-shelf) Hierarchical Clustering and the existing sparse Hierarchical Clustering framework.

15 days free trial to Access Article

Jian Chen - One of the best experts on this subject based on the ideXlab platform.

Towards understanding Hierarchical Clustering: A data distribution perspective

Neurocomputing, 2009

Co-Authors: Hui Xiong, Jian Chen

Abstract:

A very important category of Clustering methods is Hierarchical Clustering. There are considerable research efforts which have been focused on algorithm-level improvements of the Hierarchical Clustering process. In this paper, our goal is to provide a systematic understanding of Hierarchical Clustering from a data distribution perspective. Specifically, we investigate the issues about how the ''true'' cluster distribution can make impact on the Clustering performance, and what is the relationship between Hierarchical Clustering schemes and validation measures with respect to different data distributions. To this end, we provide an organized study to illustrate these issues. Indeed, one of our key findings reveals that Hierarchical Clustering tends to produce clusters with high variation on cluster sizes regardless of ''true'' cluster distributions. Also, our results show that F-measure, an external Clustering validation measure, has bias towards Hierarchical Clustering algorithms which tend to increase the variation on cluster sizes. Viewed in light of this, we propose F"n"o"r"m, the normalized version of the F-measure, to solve the cluster validation problem for Hierarchical Clustering. Experimental results show that F"n"o"r"m is indeed more suitable than the unnormalized F-measure in evaluating the Hierarchical Clustering results across data sets with different data distributions.

15 days free trial to Access Article

Chenjian - One of the best experts on this subject based on the ideXlab platform.

Towards understanding Hierarchical Clustering

Neurocomputing, 2009

Co-Authors: Wujunjie, Xionghui, Chenjian

Abstract:

A very important category of Clustering methods is Hierarchical Clustering. There are considerable research efforts which have been focused on algorithm-level improvements of the Hierarchical clust...

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Sanjoy Dasgupta - One of the best experts on this subject based on the ideXlab platform.

ICML - Interactive Bayesian Hierarchical Clustering

Interactive Bayesian Hierarchical Clustering

Performance guarantees for Hierarchical Clustering

COLT - Performance Guarantees for Hierarchical Clustering

Ruben H. Zamar - One of the best experts on this subject based on the ideXlab platform.

Multi-rank Sparse Hierarchical Clustering

A natural framework for sparse Hierarchical Clustering.

Hongyang Zhang - One of the best experts on this subject based on the ideXlab platform.

Multi-rank Sparse Hierarchical Clustering

A natural framework for sparse Hierarchical Clustering.

Jian Chen - One of the best experts on this subject based on the ideXlab platform.

Towards understanding Hierarchical Clustering: A data distribution perspective

Chenjian - One of the best experts on this subject based on the ideXlab platform.

Towards understanding Hierarchical Clustering