K-Means Clustering

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 132861 Experts worldwide ranked by ideXlab platform

Xie Guang-qiang - One of the best experts on this subject based on the ideXlab platform.

  • Research on Differential Privacy Preserving K-Means Clustering
    Computer Science, 2013
    Co-Authors: Xie Guang-qiang
    Abstract:

    We studied K-Means privacy preserving Clustering method within the framework of differential privacy.We first introduced the research status of privacy preserve data mining and privacy preserve Clustering,briefly presenting the basic principle and method of differential privacy.To improve the poor Clustering availability of differential privacy K-Means,we presented a new method of IDP K-Means Clustering and proved it satisfies e-differential privacy.Our experiments show that at the same level of privacy preserve,IDP K-Means Clustering gets a much higher Clustering availability than differential privacy K-Means Clustering method.

Huanhuan Chen - One of the best experts on this subject based on the ideXlab platform.

  • Multiple Kernel K-Means Clustering by Selecting Representative Kernels
    IEEE transactions on neural networks and learning systems, 2020
    Co-Authors: Yaqiang Yao, Bingbing Jiang, Huanhuan Chen
    Abstract:

    To cluster data that are not linearly separable in the original feature space, K-Means Clustering was extended to the kernel version. However, the performance of kernel K-Means Clustering largely depends on the choice of the kernel function. To mitigate this problem, multiple kernel learning has been introduced into the K-Means Clustering to obtain an optimal kernel combination for Clustering. Despite the success of multiple kernel K-Means Clustering in various scenarios, few of the existing work update the combination coefficients based on the diversity of kernels, which leads to the result that the selected kernels contain high redundancy and would degrade the Clustering performance and efficiency. We resolve this problem from the perspective of subset selection in this article. In particular, we first propose an effective strategy to select a diverse subset from the prespecified kernels as the representative kernels, and then incorporate the subset selection process into the framework of multiple K-Means Clustering. The representative kernels can be indicated as a significant combination weights. Due to the nonconvexity of the obtained objective function, we develop an alternating minimization method to optimize the combination coefficients of the selected kernels and the cluster membership alternatively. In particular, an efficient optimization method is developed to reduce the time complexity of optimizing the kernel combination weights. Finally, extensive experiments on benchmark and real-world data sets demonstrate the effectiveness and superiority of our approach in comparison with existing methods.

  • Multiple Kernel $k$-Means Clustering by Selecting Representative Kernels
    arXiv: Learning, 2018
    Co-Authors: Yaqiang Yao, Huanhuan Chen
    Abstract:

    To cluster data that are not linearly separable in the original feature space, $k$-means Clustering was extended to the kernel version. However, the performance of kernel $k$-means Clustering largely depends on the choice of kernel function. To mitigate this problem, multiple kernel learning has been introduced into the $k$-means Clustering to obtain an optimal kernel combination for Clustering. Despite the success of multiple kernel $k$-means Clustering in various scenarios, few of the existing work update the combination coefficients based on the diversity of kernels, which leads to the result that the selected kernels contain high redundancy and would degrade the Clustering performance and efficiency. In this paper, we propose a simple but efficient strategy that selects a diverse subset from the pre-specified kernels as the representative kernels, and then incorporate the subset selection process into the framework of multiple $k$-means Clustering. The representative kernels can be indicated as the significant combination weights. Due to the non-convexity of the obtained objective function, we develop an alternating minimization method to optimize the combination coefficients of the selected kernels and the cluster membership alternatively. We evaluate the proposed approach on several benchmark and real-world datasets. The experimental results demonstrate the competitiveness of our approach in comparison with the state-of-the-art methods.

Marie Francoise Devaux - One of the best experts on this subject based on the ideXlab platform.

Yoshikazu Terada - One of the best experts on this subject based on the ideXlab platform.

  • Strong Consistency of Reduced K-Means Clustering
    Scandinavian Journal of Statistics, 2014
    Co-Authors: Yoshikazu Terada
    Abstract:

    type="main" xml:id="sjos12074-abs-0001"> Reduced K-Means Clustering is a method for Clustering objects in a low-dimensional subspace. The advantage of this method is that both Clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously obtained. In this paper, the relationship between conventional K-Means Clustering and reduced K-Means Clustering is discussed. Conditions ensuring almost sure convergence of the estimator of reduced K-Means Clustering as unboundedly increasing sample size have been presented. The results for a more general model considering conventional K-Means Clustering and reduced K-Means Clustering are provided in this paper. Moreover, a consistent selection of the numbers of clusters and dimensions is described.

  • Strong Consistency of Reduced K-Means Clustering
    arXiv: Statistics Theory, 2012
    Co-Authors: Yoshikazu Terada
    Abstract:

    Reduced K-Means Clustering is a method for Clustering objects in a low-dimensional subspace. The advantage of this method is that both Clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously obtained. In this paper, the relationship between conventional K-Means Clustering and reduced K-Means Clustering is discussed. Conditions ensuring almost sure convergence of the estimator of reduced K-Means Clustering as unboundedly increasing sample size have been presented. The results for a more general model considering conventional K-Means Clustering and reduced K-Means Clustering are provided in this paper. Moreover, a new criterion and its consistent estimator are proposed to determine the optimal dimension number of a subspace, given the number of clusters.

D. Sculley - One of the best experts on this subject based on the ideXlab platform.

  • WWW - Web-scale K-Means Clustering
    Proceedings of the 19th international conference on World wide web - WWW '10, 2010
    Co-Authors: D. Sculley
    Abstract:

    We present two modifications to the popular K-Means Clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for K-Means Clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast e-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml