UPGMA

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 29826 Experts worldwide ranked by ideXlab platform

Pietro Laface - One of the best experts on this subject based on the ideXlab platform.

  • exact memory constrained UPGMA for large scale speaker clustering
    Pattern Recognition, 2019
    Co-Authors: Sandro Cumani, Pietro Laface
    Abstract:

    Abstract This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O ( N 2 ) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O ( N 2 ) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.

  • Exact memory–constrained UPGMA for large scale speaker clustering
    Pattern Recognition, 2019
    Co-Authors: Sandro Cumani, Pietro Laface
    Abstract:

    Abstract This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O ( N 2 ) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O ( N 2 ) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.

Sandro Cumani - One of the best experts on this subject based on the ideXlab platform.

  • exact memory constrained UPGMA for large scale speaker clustering
    Pattern Recognition, 2019
    Co-Authors: Sandro Cumani, Pietro Laface
    Abstract:

    Abstract This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O ( N 2 ) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O ( N 2 ) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.

  • Exact memory–constrained UPGMA for large scale speaker clustering
    Pattern Recognition, 2019
    Co-Authors: Sandro Cumani, Pietro Laface
    Abstract:

    Abstract This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O ( N 2 ) , but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O ( N 2 ) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.

Allen G Rodrigo - One of the best experts on this subject based on the ideXlab platform.

  • reconstructing genealogies of serial samples under the assumption of a molecular clock using serial sample UPGMA
    Molecular Biology and Evolution, 2000
    Co-Authors: Alexei J Drummond, Allen G Rodrigo
    Abstract:

    : Reconstruction of evolutionary relationships from noncontemporaneous molecular samples provides a new challenge for phylogenetic reconstruction methods. With recent biotechnological advances there has been an increase in molecular sequencing throughput, and the potential to obtain serial samples of sequences from populations, including rapidly evolving pathogens, is fast being realized. A new method called the serial-sample unweighted pair grouping method with arithmetic means (sUPGMA) is presented that reconstructs a genealogy or phylogeny of sequences sampled serially in time using a matrix of pairwise distances. The resulting tree depicts the terminal lineages of each sample ending at a different level consistent with the sample's temporal order. Since sUPGMA is a variant of UPGMA, it will perform best when sequences have evolved at a constant rate (i.e., according to a molecular clock). On simulated data, this new method performs better than standard cluster analysis under a variety of longitudinal sampling strategies. Serial-sample UPGMA is particularly useful for analysis of longitudinal samples of viruses and bacteria, as well as ancient DNA samples, with the minimal requirement that samples of sequences be ordered in time.

Che-lun Hung - One of the best experts on this subject based on the ideXlab platform.

  • BIBM - Efficient parallel UPGMA algorithm based on multiple GPUs
    2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016
    Co-Authors: Che-lun Hung, Fu-che Wu, Yu-wei Chan
    Abstract:

    A phylogenetic tree is used to present the evolutionary relationships among the interesting biological species based on the similarities in their genetic sequences. The UPGMA is one of the popular algorithms to construct a phylogenetic tree according to the distance matrix created by the pairwise distances among taxa. To solve the performance issue of the UPGMA, the implementation of the UPGMA method on a single GPU has been proposed. However, it is not capable of handling the large taxa set. This work describes a novel parallel UPGMA approach on multiple GPUs that is able to build a tree from extremely large datasets. The experimental results show that the proposed approach with 4 NVIDIA GTX 980 achieves an approximately × fold speedup over the implementation of UPGMA on CPU and GPU, respectively.

  • Efficient parallel UPGMA algorithm based on multiple GPUs
    2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016
    Co-Authors: Che-lun Hung, Fu-che Wu, Yu-wei Chan
    Abstract:

    A phylogenetic tree is used to present the evolutionary relationships among the interesting biological species based on the similarities in their genetic sequences. The UPGMA is one of the popular algorithms to construct a phylogenetic tree according to the distance matrix created by the pairwise distances among taxa. To solve the performance issue of the UPGMA, the implementation of the UPGMA method on a single GPU has been proposed. However, it is not capable of handling the large taxa set. This work describes a novel parallel UPGMA approach on multiple GPUs that is able to build a tree from extremely large datasets. The experimental results show that the proposed approach with 4 NVIDIA GTX 980 achieves an approximately × fold speedup over the implementation of UPGMA on CPU and GPU, respectively.

  • gpu UPGMA high performance computing for UPGMA algorithm based on graphics processing units
    Concurrency and Computation: Practice and Experience, 2015
    Co-Authors: Che-lun Hung, Yehching Chung
    Abstract:

    Summary Constructing phylogenetic trees is of priority concern in computational biology, especially for developing biological taxonomies. As a conventional means of constructing phylogenetic trees, unweighted pair group method with arithmetic (UPGMA) is also an extensively adopted heuristic algorithm for constructing ultrametric trees (UT). Although the UT constructed by UPGMA is often not a true tree unless the molecular clock assumption holds, UT is still useful for the clocklike data. Moreover, UT has been successfully adopted in other problems, including orthologous-domain classification and multiple sequence alignment. However, previous implementations of the UPGMA method have a limited ability to handle large taxa sets efficiently. This work describes a novel graphics processing unit (GPU)-UPGMA approach, capable of providing rapid construction of extremely large datasets for biologists. Experimental results indicate that the proposed GPU-UPGMA approach achieves an approximately 95× speedup ratio on NVIDIA Tesla C2050 GPU over the implementation with 2.13 GHz CPU. The developed techniques in GPU-UPGMA also can be applied to solve the classification problem for large data set with more than tens of thousands items in the future.Copyright © 2014 John Wiley & Sons, Ltd.

  • GPU‐UPGMA: high‐performance computing for UPGMA algorithm based on graphics processing units
    Concurrency and Computation: Practice and Experience, 2014
    Co-Authors: Che-lun Hung, Yehching Chung
    Abstract:

    Summary Constructing phylogenetic trees is of priority concern in computational biology, especially for developing biological taxonomies. As a conventional means of constructing phylogenetic trees, unweighted pair group method with arithmetic (UPGMA) is also an extensively adopted heuristic algorithm for constructing ultrametric trees (UT). Although the UT constructed by UPGMA is often not a true tree unless the molecular clock assumption holds, UT is still useful for the clocklike data. Moreover, UT has been successfully adopted in other problems, including orthologous-domain classification and multiple sequence alignment. However, previous implementations of the UPGMA method have a limited ability to handle large taxa sets efficiently. This work describes a novel graphics processing unit (GPU)-UPGMA approach, capable of providing rapid construction of extremely large datasets for biologists. Experimental results indicate that the proposed GPU-UPGMA approach achieves an approximately 95× speedup ratio on NVIDIA Tesla C2050 GPU over the implementation with 2.13 GHz CPU. The developed techniques in GPU-UPGMA also can be applied to solve the classification problem for large data set with more than tens of thousands items in the future.Copyright © 2014 John Wiley & Sons, Ltd.

  • Parallel UPGMA Algorithm on Graphics Processing Units Using CUDA
    2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software , 2012
    Co-Authors: Yu-rong Chen, Che-lun Hung
    Abstract:

    The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However, a fundamental problem with the previous implementations of this method is its limitation to handle large tax a sets within a reasonable time. In this paper, we present GPU-UPGMA which can provide a fast construction of very large datasets for biologists. Experimental results show that GPU-UPGMA obtains about 95 times speedup on NVIDIA Tesla C2050 GPU over the 2.13 GHz CPU implementation.

Seth Sullivant - One of the best experts on this subject based on the ideXlab platform.

  • Distance-Based Phylogenetic Methods Around a Polytomy
    IEEE ACM Transactions on Computational Biology and Bioinformatics, 2014
    Co-Authors: Ruth Davidson, Seth Sullivant
    Abstract:

    Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.

  • Distance-based phylogenetic methods around a polytomy
    arXiv: Populations and Evolution, 2013
    Co-Authors: Ruth Davidson, Seth Sullivant
    Abstract:

    Distance-based phylogenetic algorithms attempt to solve the NP-hard least squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and Neighbor-Joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least squares phylogeny, UPGMA, and Neighbor-Joining. Our results suggest that in some circumstances, UPGMA and Neighbor-Joining poorly match least squares phylogeny when the true tree has a polytomy.

  • polyhedral combinatorics of UPGMA cones
    Advances in Applied Mathematics, 2013
    Co-Authors: Ruth Davidson, Seth Sullivant
    Abstract:

    Distance-based methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) continue to play a significant role in phylogenetic research. We use polyhedral combinatorics to analyze the natural subdivision of the positive orthant induced by classifying the input vectors according to tree topologies returned by the algorithm. The partition lattice informs the study of UPGMA trees. We give a closed form for the extreme rays of UPGMA cones on n taxa, and compute the spherical volumes of the UPGMA cones for small n.

  • polyhedral combinatorics of UPGMA cones
    arXiv: Populations and Evolution, 2012
    Co-Authors: Ruth Davidson, Seth Sullivant
    Abstract:

    Distance-based methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) continue to play a significant role in phylogenetic research. We use polyhedral combinatorics to analyze the natural subdivision of the positive orthant induced by classifying the input vectors according to tree topologies returned by the algorithm. The partition lattice informs the study of UPGMA trees. We give a closed form for the extreme rays of UPGMA cones on n taxa, and compute the normalized volumes of the UPGMA cones for small n. Keywords: phylogenetic trees, polyhedral combinatorics, partition lattice