Pairwise Similarity

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5049 Experts worldwide ranked by ideXlab platform

Zheng Zhang - One of the best experts on this subject based on the ideXlab platform.

  • Discriminative dual-stream deep hashing for large-scale image retrieval
    Information Processing and Management, 2020
    Co-Authors: Yujuan Ding, Wai Keung Wong, Zheng Zhang
    Abstract:

    Abstract Deep hashing has been an important research topic for using deep learning to boost performance of hash learning. Most existing deep supervised hashing methods mainly focus on how to effectively preserve the Similarity in hash coding solely depending on Pairwise supervision. However, such Pairwise Similarity-preserving strategy cannot fully explore the semantic information in most cases, which results in information loss. To address this problem, this paper proposes a discriminative dual-stream deep hashing (DDDH) method, which integrates the Pairwise Similarity loss and the classification loss into a unified framework to take full advantage of label information. Specifically, the Pairwise Similarity loss aims to preserve the Similarity and structural information of high-dimensional original data. Meanwhile, the designed classification loss can enlarge the margin between different classes which improves the discrimination of learned binary codes. Moreover, an effective optimization algorithm is employed to train the hash code learning framework in an end-to-end manner. The results of extensive experiments on three image datasets demonstrate that our method is superior to several state-of-the-art deep and non-deep hashing methods. Ablation studies and analysis further show the effectiveness of introducing the classification loss in the overall hash learning framework.

  • Improved Deep Hashing With Soft Pairwise Similarity for Multi-Label Image Retrieval
    IEEE Transactions on Multimedia, 2020
    Co-Authors: Zheng Zhang, Long Chen, Song Wang
    Abstract:

    Hash coding has been widely used in the approximate nearest neighbor search for large-scale image retrieval. Recently, many deep hashing methods have been proposed and shown largely improved performance over traditional feature-learning methods. Most of these methods examine the Pairwise Similarity on the semantic-level labels, where the Pairwise Similarity is generally defined in a hard-assignment way. That is, the Pairwise Similarity is “1” if they share no less than one class label and “0” if they do not share any. However, such Similarity definition cannot reflect the Similarity ranking for Pairwise images that hold multiple labels. In this paper, an improved deep hashing method is proposed to enhance the ability of multi-label image retrieval. We introduce a Pairwise quantified Similarity calculated on the normalized semantic labels. Based on this, we divide the Pairwise Similarity into two situations-“hard Similarity” and “soft Similarity,” where cross-entropy loss and mean square error loss are adapted respectively for more robust feature learning and hash coding. Experiments on four popular datasets demonstrate that the proposed method outperforms the competing methods and achieves the state-of-the-art performance in multi-label image retrieval.

  • Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding
    IEEE Transactions on Image Processing, 2019
    Co-Authors: Zheng Zhang, Wai Keung Wong, Zi Huang, Ling Shao
    Abstract:

    Compact hash code learning has been widely applied to fast Similarity search owing to its significantly reduced storage and highly efficient query speed. However, it is still a challenging task to learn discriminative binary codes for perfectly preserving the full Pairwise similarities embedded in the high-dimensional real-valued features, such that the promising performance can be guaranteed. To overcome this difficulty, in this paper, we propose a novel scalable supervised asymmetric hashing (SSAH) method, which can skillfully approximate the full-Pairwise Similarity matrix based on maximum asymmetric inner product of two different non-binary embeddings. In particular, to comprehensively explore the semantic information of data, the supervised label information and the refined latent feature embedding are simultaneously considered to construct the high-quality hashing function and boost the discriminant of the learned binary codes. Specifically, SSAH learns two distinctive hashing functions in conjunction of minimizing the regression loss on the semantic label alignment and the encoding loss on the refined latent features. More importantly, instead of using only part of Similarity correlations of data, the full-Pairwise Similarity matrix is directly utilized to avoid information loss and performance degeneration, and its cumbersome computation complexity on $n \times n$ matrix can be dexterously manipulated during the optimization phase. Furthermore, an efficient alternating optimization scheme with guaranteed convergence is designed to address the resulting discrete optimization problem. The encouraging experimental results on diverse benchmark datasets demonstrate the superiority of the proposed SSAH method in comparison with many recently proposed hashing algorithms.

  • SADIH: Semantic-Aware DIscrete Hashing
    arXiv: Computer Vision and Pattern Recognition, 2019
    Co-Authors: Zheng Zhang, Yang Li, Sheng Li, Zi Huang
    Abstract:

    Due to its low storage cost and fast query speed, hashing has been recognized to accomplish Similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging the label information to preserve the Pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full Pairwise Similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric Similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full Pairwise similarities while skillfully handle the cumbersome n times n Pairwise Similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.

  • AAAI - SADIH: Semantic-Aware DIscrete Hashing
    Proceedings of the AAAI Conference on Artificial Intelligence, 2019
    Co-Authors: Zheng Zhang, Yang Li, Sheng Li, Zi Huang
    Abstract:

    Due to its low storage cost and fast query speed, hashing has been recognized to accomplish Similarity search in largescale multimedia retrieval applications. Particularly, supervised hashing has recently received considerable research attention by leveraging the label information to preserve the Pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full Pairwise Similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric Similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full Pairwise similarities while skillfully handle the cumbersome n×n Pairwise Similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.

Samuel H Payne - One of the best experts on this subject based on the ideXlab platform.

  • blazing signature filter a library for fast Pairwise Similarity comparisons
    BMC Bioinformatics, 2018
    Co-Authors: Grant M Fujimoto, Ryan Wilson, Steven H Wiley, Samuel H Payne
    Abstract:

    Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional Similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. The Blazing Signature Filter (BSF) is a highly efficient Pairwise Similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of Similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. The BSF is a highly efficient Pairwise Similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.

  • Blazing Signature Filter: a library for fast Pairwise Similarity comparisons
    BMC Bioinformatics, 2018
    Co-Authors: Grant M Fujimoto, Ryan Wilson, H. Steven Wiley, Samuel H Payne
    Abstract:

    Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional Similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. Results The Blazing Signature Filter (BSF) is a highly efficient Pairwise Similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of Similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. Conclusions The BSF is a highly efficient Pairwise Similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.

  • blazing signature filter a library for fast Pairwise Similarity comparisons
    bioRxiv, 2017
    Co-Authors: Grant M Fujimoto, Ryan Wilson, Steven H Wiley, Samuel H Payne
    Abstract:

    Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional Similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is that the vast majority of Pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive Similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient Pairwise Similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of Similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive Pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of Pairwise comparisons and the usefulness of this approach.

Thomas S Huang - One of the best experts on this subject based on the ideXlab platform.

  • Discriminative Similarity for Clustering and Semi-Supervised Learning
    arXiv: Machine Learning, 2017
    Co-Authors: Yingzhen Yang, Feng Liang, Nebojsa Jojic, Jiashi Feng, Thomas S Huang
    Abstract:

    Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the Pairwise Similarity between the data, and the Pairwise Similarity is crucial for their performance. In this paper, we propose a novel discriminative Similarity learning framework which learns discriminative Similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of Pairwise Similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such Pairwise Similarity serves as the discriminative Similarity for the purpose of clustering and semi-supervised learning, and discriminative Similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative Similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.

  • on a theory of nonparametric Pairwise Similarity for clustering connecting clustering to classification
    Neural Information Processing Systems, 2014
    Co-Authors: Yingzhen Yang, Feng Liang, Zhangyang Wang, Thomas S Huang
    Abstract:

    Pairwise clustering methods partition the data space into clusters by the Pairwise Similarity between data points. The success of Pairwise clustering largely depends on the Pairwise Similarity function defined over the data points, where kernel Similarity is broadly used. In this paper, we present a novel Pairwise clustering framework by bridging the gap between clustering and multi-class classification. This Pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric Pairwise Similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric Similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel Similarity. We also prove that the generalization error bound for the unsupervised plug-in classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric Pairwise Similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

  • NIPS - On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification
    2014
    Co-Authors: Yingzhen Yang, Feng Liang, Zhangyang Wang, Thomas S Huang
    Abstract:

    Pairwise clustering methods partition the data space into clusters by the Pairwise Similarity between data points. The success of Pairwise clustering largely depends on the Pairwise Similarity function defined over the data points, where kernel Similarity is broadly used. In this paper, we present a novel Pairwise clustering framework by bridging the gap between clustering and multi-class classification. This Pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric Pairwise Similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric Similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel Similarity. We also prove that the generalization error bound for the unsupervised plug-in classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric Pairwise Similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

Zi Huang - One of the best experts on this subject based on the ideXlab platform.

  • Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding
    IEEE Transactions on Image Processing, 2019
    Co-Authors: Zheng Zhang, Wai Keung Wong, Zi Huang, Ling Shao
    Abstract:

    Compact hash code learning has been widely applied to fast Similarity search owing to its significantly reduced storage and highly efficient query speed. However, it is still a challenging task to learn discriminative binary codes for perfectly preserving the full Pairwise similarities embedded in the high-dimensional real-valued features, such that the promising performance can be guaranteed. To overcome this difficulty, in this paper, we propose a novel scalable supervised asymmetric hashing (SSAH) method, which can skillfully approximate the full-Pairwise Similarity matrix based on maximum asymmetric inner product of two different non-binary embeddings. In particular, to comprehensively explore the semantic information of data, the supervised label information and the refined latent feature embedding are simultaneously considered to construct the high-quality hashing function and boost the discriminant of the learned binary codes. Specifically, SSAH learns two distinctive hashing functions in conjunction of minimizing the regression loss on the semantic label alignment and the encoding loss on the refined latent features. More importantly, instead of using only part of Similarity correlations of data, the full-Pairwise Similarity matrix is directly utilized to avoid information loss and performance degeneration, and its cumbersome computation complexity on $n \times n$ matrix can be dexterously manipulated during the optimization phase. Furthermore, an efficient alternating optimization scheme with guaranteed convergence is designed to address the resulting discrete optimization problem. The encouraging experimental results on diverse benchmark datasets demonstrate the superiority of the proposed SSAH method in comparison with many recently proposed hashing algorithms.

  • SADIH: Semantic-Aware DIscrete Hashing
    arXiv: Computer Vision and Pattern Recognition, 2019
    Co-Authors: Zheng Zhang, Yang Li, Sheng Li, Zi Huang
    Abstract:

    Due to its low storage cost and fast query speed, hashing has been recognized to accomplish Similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging the label information to preserve the Pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full Pairwise Similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric Similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full Pairwise similarities while skillfully handle the cumbersome n times n Pairwise Similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.

  • AAAI - SADIH: Semantic-Aware DIscrete Hashing
    Proceedings of the AAAI Conference on Artificial Intelligence, 2019
    Co-Authors: Zheng Zhang, Yang Li, Sheng Li, Zi Huang
    Abstract:

    Due to its low storage cost and fast query speed, hashing has been recognized to accomplish Similarity search in largescale multimedia retrieval applications. Particularly, supervised hashing has recently received considerable research attention by leveraging the label information to preserve the Pairwise similarities of data points in the Hamming space. However, there still remain two crucial bottlenecks: 1) the learning process of the full Pairwise Similarity preservation is computationally unaffordable and unscalable to deal with big data; 2) the available category information of data are not well-explored to learn discriminative hash functions. To overcome these challenges, we propose a unified Semantic-Aware DIscrete Hashing (SADIH) framework, which aims to directly embed the transformed semantic information into the asymmetric Similarity approximation and discriminative hashing function learning. Specifically, a semantic-aware latent embedding is introduced to asymmetrically preserve the full Pairwise similarities while skillfully handle the cumbersome n×n Pairwise Similarity matrix. Meanwhile, a semantic-aware autoencoder is developed to jointly preserve the data structures in the discriminative latent semantic space and perform data reconstruction. Moreover, an efficient alternating optimization algorithm is proposed to solve the resulting discrete optimization problem. Extensive experimental results on multiple large-scale datasets demonstrate that our SADIH can clearly outperform the state-of-the-art baselines with the additional benefit of lower computational costs.

  • Discrete Hashing With Multiple Supervision
    IEEE Transactions on Image Processing, 2019
    Co-Authors: Peng-fei Zhang, Zi Huang, Xin-shun Xu
    Abstract:

    Supervised hashing methods have achieved more promising results than unsupervised ones by leveraging label information to generate compact and accurate hash codes. Most of the prior supervised hashing methods construct an n × n instance-Pairwise Similarity matrix, where n is the number of training samples. Nevertheless, this kind of Similarity matrix results in high memory space cost and makes the optimization time-consuming, making it unacceptable in many real applications. In addition, most of the methods relax the discrete constraints to solve the optimization problem, which may cause large quantization errors and finally lead to poor performance. To address these limitations, in this paper, we present a novel hashing method, named discrete hashing with multiple supervision (MSDH). MSDH supervises the hash code learning with both class-wise and instance-class Similarity matrices, whose space cost is much less than the instance-Pairwise Similarity matrix. With multiple supervision information, better hash codes can be learned. Besides, an iterative optimization algorithm is proposed to directly learn the discrete hash codes instead of relaxing the binary constraints. Experimental results on several widely used benchmark datasets demonstrate that MSDH outperforms some state-of-the-art methods.

  • Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding
    IEEE Transactions on Image Processing, 2019
    Co-Authors: Zheng Zhang, Wai Keung Wong, Zi Huang, Ling Shao
    Abstract:

    Compact hash code learning has been widely applied to fast Similarity search owing to its significantly reduced storage and highly efficient query speed. However, it is still a challenging task to learn discriminative binary codes for perfectly preserving the full Pairwise similarities embedded in the high-dimensional real-valued features, such that the promising performance can be guaranteed. To overcome this difficulty, in this paper, we propose a novel scalable supervised asymmetric hashing (SSAH) method, which can skillfully approximate the full-Pairwise Similarity matrix based on maximum asymmetric inner product of two different non-binary embeddings. In particular, to comprehensively explore the semantic information of data, the supervised label information and the refined latent feature embedding are simultaneously considered to construct the high-quality hashing function and boost the discriminant of the learned binary codes. Specifically, SSAH learns two distinctive hashing functions in conjunction of minimizing the regression loss on the semantic label alignment and the encoding loss on the refined latent features. More importantly, instead of using only part of Similarity correlations of data, the full-Pairwise Similarity matrix is directly utilized to avoid information loss and performance degeneration, and its cumbersome computation complexity on n × n matrix can be dexterously manipulated during the optimization phase. Furthermore, an efficient alternating optimization scheme with guaranteed convergence is designed to address the resulting discrete optimization problem. The encouraging experimental results on diverse benchmark datasets demonstrate the superiority of the proposed SSAH method in comparison with many recently proposed hashing algorithms.

A.k. Jain - One of the best experts on this subject based on the ideXlab platform.

  • semi supervised clustering by input pattern assisted Pairwise Similarity matrix completion
    International Conference on Machine Learning, 2013
    Co-Authors: Jinfeng Yi, Lijun Zhang, Qi Qian, A.k. Jain
    Abstract:

    Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of Pairwise constraints. However, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semisupervised clustering based on input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(log n) Pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semisupervised clustering algorithms on several benchmark datasets.

  • ICPR (1) - Learning Pairwise Similarity for Data Clustering
    18th International Conference on Pattern Recognition (ICPR'06), 2006
    Co-Authors: A.l.n. Fred, A.k. Jain
    Abstract:

    Each clustering algorithm induces a Similarity between given data points, according to the underlying clustering criteria. Given the large number of available clustering techniques, one is faced with the following questions: (a) Which measure of Similarity should be used in a given clustering problem? (b) Should the same Similarity measure be used throughout the d-dimensional feature space? In other words, are the underlying clusters in given data of similar shape? Our goal is to learn the Pairwise Similarity between points in order to facilitate a proper partitioning of the data without the a priori knowledge of k, the number of clusters, and of the shape of these clusters. We explore a clustering ensemble approach combined with cluster stability criteria to selectively learn the Similarity from a collection of different clustering algorithms with various parameter configurations.

  • Learning Pairwise Similarity for Data Clustering
    18th International Conference on Pattern Recognition (ICPR'06), 2006
    Co-Authors: A.l.n. Fred, A.k. Jain
    Abstract:

    Each clustering algorithm induces a Similarity between given data points, according to the underlying clustering criteria. Given the large number of available clustering techniques, one is faced with the following questions: (a) Which measure of Similarity should be used in a given clustering problem? (b) Should the same Similarity measure be used throughout the d-dimensional feature space? In other words, are the underlying clusters in given data of similar shape? Our goal is to learn the Pairwise Similarity between points in order to facilitate a proper partitioning of the data without the a priori knowledge of k, the number of clusters, and of the shape of these clusters. We explore a clustering ensemble approach combined with cluster stability criteria to selectively learn the Similarity from a collection of different clustering algorithms with various parameter configurations