Clustering Problem

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Charu C. Aggarwal - One of the best experts on this subject based on the ideXlab platform.

  • A Survey of Text Clustering Algorithms
    Mining Text Data, 2012
    Co-Authors: Charu C. Aggarwal, Cheng Xiang Zhai
    Abstract:

    Clustering is a widely studied data mining Problem in the text domains. The Problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this chapter, we will provide a detailed survey of the Problem of text Clustering. We will study the key challenges of the Clustering Problem, as it applies to the text domain. We will discuss the key methods used for text Clustering, and their relative advantages. We will also discuss a number of recent advances in the area in the context of social network and linked data

  • On Clustering massive text and categorical data streams
    Knowledge and Information Systems, 2009
    Co-Authors: Charu C. Aggarwal
    Abstract:

    In this paper, we will study the data stream Clustering Problem in the context of text and categorical data domains. While the Clustering Problem has been studied recently for numeric data streams, the Problems of text and categorical data present different challenges because of the large and un-ordered nature of the corresponding attributes. Therefore, we will propose algorithms for text and categorical data stream Clustering. We will propose a condensation based approach for stream Clustering which summarizes the stream into a number of fine grained cluster droplets. These summarized droplets can be used in conjunction with a variety of user queries to construct the clusters for different input parameters. Thus, this provides an online analytical processing approach to stream Clustering. We also study the Problem of detecting noisy and outlier records in real time. We will test the approach for a number of real and synthetic data sets, and show the effectiveness of the method over the baseline OSKM algorithm for stream Clustering.

  • a framework for Clustering massive text and categorical data streams
    SIAM International Conference on Data Mining, 2006
    Co-Authors: Charu C. Aggarwal
    Abstract:

    Many applications such as news group filtering, text crawling, and document organization require real time Clustering and segmentation of text data records. The categorical data stream Clustering Problem also has a number of applications to the Problems of customer segmentation and real time trend analysis. We will present an online approach for Clustering massive text and categorical data streams with the use of a statistical summarization methodology. We present results illustrating the effectiveness of the technique.

Jiannong Cao - One of the best experts on this subject based on the ideXlab platform.

  • dynamic genetic algorithms for the dynamic load balanced Clustering Problem in mobile ad hoc networks
    Expert Systems With Applications, 2013
    Co-Authors: Hui Cheng, Shengxiang Yang, Jiannong Cao
    Abstract:

    Clustering can help aggregate the topology information and reduce the size of routing tables in a mobile ad hoc network (MANET). To achieve fairness and uniform energy consumption, each clusterhead should ideally support the same number of clustermembers. However, a MANET is a dynamic and complex system and its one important characteristic is the topology dynamics, that is, the network topology changes over time due to the factors such as energy conservation and node movement. Therefore, in a MANET, an effective Clustering algorithm should efficiently adapt to each topology change and produce the new load balanced clusterhead set quickly. The maintenance of the cluster structure should aim to keep it as stable as possible to reduce overhead. To meet this requirement, the new solution should keep as many good parts in the previous solution as possible. In this paper, we first formulate the dynamic load balanced Clustering Problem (DLBCP) into a dynamic optimization Problem. Then, we propose to use a series of dynamic genetic algorithms (GAs) to solve the DLBCP in MANETs. In these dynamic GAs, each individual represents a feasible Clustering structure and its fitness is evaluated based on the load balance metric. Various dynamics handling techniques are introduced to help the population to deal with the topology changes and produce closely related solutions in good quality. The experimental results show that these GAs can work well for the DLBCP and outperform traditional GAs that do not consider dynamic network optimization requirements.

Hui Cheng - One of the best experts on this subject based on the ideXlab platform.

  • dynamic genetic algorithms for the dynamic load balanced Clustering Problem in mobile ad hoc networks
    Expert Systems With Applications, 2013
    Co-Authors: Hui Cheng, Shengxiang Yang, Jiannong Cao
    Abstract:

    Clustering can help aggregate the topology information and reduce the size of routing tables in a mobile ad hoc network (MANET). To achieve fairness and uniform energy consumption, each clusterhead should ideally support the same number of clustermembers. However, a MANET is a dynamic and complex system and its one important characteristic is the topology dynamics, that is, the network topology changes over time due to the factors such as energy conservation and node movement. Therefore, in a MANET, an effective Clustering algorithm should efficiently adapt to each topology change and produce the new load balanced clusterhead set quickly. The maintenance of the cluster structure should aim to keep it as stable as possible to reduce overhead. To meet this requirement, the new solution should keep as many good parts in the previous solution as possible. In this paper, we first formulate the dynamic load balanced Clustering Problem (DLBCP) into a dynamic optimization Problem. Then, we propose to use a series of dynamic genetic algorithms (GAs) to solve the DLBCP in MANETs. In these dynamic GAs, each individual represents a feasible Clustering structure and its fitness is evaluated based on the load balance metric. Various dynamics handling techniques are introduced to help the population to deal with the topology changes and produce closely related solutions in good quality. The experimental results show that these GAs can work well for the DLBCP and outperform traditional GAs that do not consider dynamic network optimization requirements.

  • genetic algorithms with hyper mutation for dynamic load balanced Clustering Problem in mobile ad hoc networks
    International Conference on Natural Computation, 2012
    Co-Authors: Hui Cheng
    Abstract:

    Clustering can help aggregate the topology information and reduce the size of routing tables in a mobile ad hoc network (MANET). To achieve fairness and even energy consumption, each clusterhead should ideally support the same number of cluster members. Moreover, one of the most important characteristics in MANETs is the topology dynamics, that is, the network topology changes over time due to energy conservation or node mobility. Therefore, for a dynamic and complex system like MANET, an effective Clustering algorithm should efficiently adapt to each topology change and produce the new load balanced solution quickly. In this paper, we propose to use two types of hyper-mutation genetic algorithms (GAs) to solve the dynamic load balanced Clustering Problem in MANETs. In the GA population, each individual represents a feasible Clustering structure and its fitness is evaluated based on the load balance metric. The two GAs are named as hlHMGA and grHMGA, respectively. The experimental results show that both algorithms work well for the Problem when appropriate parameters are identified and that hlHMGA outperforms grHMGA.

Wing Hung Wong - One of the best experts on this subject based on the ideXlab platform.

  • integrative analysis of single cell genomics data by coupled nonnegative matrix factorizations
    Proceedings of the National Academy of Sciences of the United States of America, 2018
    Co-Authors: Zhana Duren, Howard Y. Chang, Ansuman T. Satpathy, Xi Chen, Mahdi Zamanighomi, Wanwen Zeng, Yong Wang, Wing Hung Wong
    Abstract:

    When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the Clustering of cells in the different samples should be coupled. We formulate this "coupled Clustering" Problem as an optimization Problem and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single-cell RNA-sequencing (RNA-seq) and single-cell ATAC-sequencing (ATAC-seq) data.

  • integrative analysis of single cell genomics data by coupled nonnegative matrix factorizations
    bioRxiv, 2018
    Co-Authors: Zhana Duren, Howard Y. Chang, Ansuman T. Satpathy, Xi Chen, Mahdi Zamanighomi, Wanwen Zeng, Yong Wang, Wing Hung Wong
    Abstract:

    When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the Clustering of cells in the different samples should be coupled. We formulate this coupled Clustering Problem as an optimization Problem, and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single cell RNA-seq and single cell ATAC-seq data.

Prasanta K Jana - One of the best experts on this subject based on the ideXlab platform.

  • a novel evolutionary approach for load balanced Clustering Problem for wireless sensor networks
    Swarm and evolutionary computation, 2013
    Co-Authors: Pratyay Kuila, Suneet Kumar Gupta, Prasanta K Jana
    Abstract:

    Clustering sensor nodes is an effective topology control method to reduce energy consumption of the sensor nodes for maximizing lifetime of Wireless Sensor Networks (WSNs). However, in a cluster based WSN, the leaders (cluster heads) bear some extra load for various activities such as data collection, data aggregation and communication of the aggregated data to the base station. Therefore, balancing the load of the cluster heads is a challenging issue for the long run operation of the WSNs. Load balanced Clustering is known to be an NP-hard Problem for a WSN with unequal load of the sensor nodes. Genetic Algorithm (GA) is one of the most popular evolutionary approach that can be applied for finding the fast and efficient solution of such Problem. In this paper, we propose a novel GA based load balanced Clustering algorithm for WSN. The proposed algorithm is shown to perform well for both equal as well as unequal load of the sensor nodes. We perform extensive simulation of the proposed method and compare the results with some evolutionary based approaches and other related Clustering algorithms. The results demonstrate that the proposed algorithm performs better than all such algorithms in terms of various performance metrics such as load balancing, execution time, energy consumption, number of active sensor nodes, number of active cluster heads and the rate of convergence.