Similarity Function

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 104976 Experts worldwide ranked by ideXlab platform

Gerbrand Ceder - One of the best experts on this subject based on the ideXlab platform.

  • Proposed definition of crystal substructure and substructural Similarity
    Physical Review B, 2014
    Co-Authors: Lusann Yang, Stephen Dacek, Gerbrand Ceder
    Abstract:

    There is a clear need for a practical and mathematically rigorous description of local structure in inorganic compounds so that structures and chemistries can be easily compared across large data sets. Here a method for decomposing crystal structures into substructures is given, and a Similarity Function between those substructures is defined. The Similarity Function is based on both geometric and chemical Similarity. This construction allows for large-scale data mining of substructural properties, and the analysis of substructures and void spaces within crystal structures. The method is validated via the prediction of Li-ion intercalation sites for the oxides. Tested on databases of known Li-ion-containing oxides, the method reproduces all Li-ion sites in an oxide with a maximum of 4 incorrect guesses 80% of the time.

  • data mined Similarity Function between material compositions
    Physical Review B, 2013
    Co-Authors: Lusann Yang, Gerbrand Ceder
    Abstract:

    A new method for assessing the Similarity of material compositions is described. A Similarity measure is important for the classification and clustering of compositions. The Similarity of the material compositions is calculated utilizing a data-mined ionic substitutional Similarity based upon the probability with which two ions will substitute for each other within the same structure prototype. The method is validated via the prediction of crystal structure prototypes for oxides from the Inorganic Crystal Structure Database, selecting the correct prototype from a list of known prototypes within five guesses 75% of the time. It performs particularly well on the quaternary oxides, selecting the correct prototype from a list of known prototypes on the first guess 65% of the time.

Fengchun Tian - One of the best experts on this subject based on the ideXlab platform.

  • Bilateral Similarity Function: a novel and universal method for Similarity analysis of biological sequences.
    Journal of Theoretical Biology, 2010
    Co-Authors: Shiyuan Wang, Fengchun Tian
    Abstract:

    Bilateral Similarity Function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined Function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for Similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.

Avrim Blum - One of the best experts on this subject based on the ideXlab platform.

  • COLT - Improved Guarantees for Learning via Similarity Functions
    2008
    Co-Authors: Maria-florina Balcan, Avrim Blum, Nathan Srebro
    Abstract:

    We continue the investigation of natural conditions for a Similarity Function to allow learning, without requiring the Similarity Function to be a valid kernel, or referring to an implicit high-dimensional space. We provide a new notion of a “good Similarity Function” that builds upon the previous definition of Balcan and Blum (2006) but improves on it in two important ways. First, as with the previous definition, any large-margin kernel is also a good Similarity Function in our sense, but the translation now results in a much milder increase in the labeled sample complexity. Second, we prove that for distribution-specific PAC learning, our new notion is strictly more powerful than the traditional notion of a large-margin kernel. In particular, we show that for any hypothesis class C there exists a Similarity Function under our definition allowing learning with O(log |C|) labeled examples. However, in a lower bound which may be of independent interest, we show that for any class C of pairwise uncorrelated Functions, there is no kernel with margin γ ≥ 8/ √ |C| for all f ∈ C, even if one allows average hinge-loss as large as 0.5. Thus, the sample complexity for learning such classes with SVMs is Ω(|C|). This extends work of BenDavid et al. (2003) and Forster and Simon (2006) who give hardness results with comparable margin bounds, but at much lower error rates. Our new notion of Similarity relies upon L1 regularized learning, and our separation result is related to a separation result between what is learnable with L1 vs. L2 regularization.

  • on a theory of learning with Similarity Functions
    International Conference on Machine Learning, 2006
    Co-Authors: Maria-florina Balcan, Avrim Blum
    Abstract:

    Kernel Functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel Function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not directly correspond to one's intuition of a good kernel as a good Similarity Function. Furthermore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand since the implicit mapping may not be easy to calculate. Finally, the requirement of positive semi-definiteness may rule out the most natural pairwise Similarity Functions for the given problem domain.In this work we develop an alternative, more general theory of learning with Similarity Functions (i.e., sufficient conditions for a Similarity Function to allow one to learn well) that does not require reference to implicit spaces, and does not require the Function to be positive semi-definite (or even symmetric). Our results also generalize the standard theory in the sense that any good kernel Function under the usual definition can be shown to also be a good Similarity Function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels that describes the effectiveness of a given kernel Function in terms of natural Similarity-based properties.

  • ICML - On a theory of learning with Similarity Functions
    Proceedings of the 23rd international conference on Machine learning - ICML '06, 2006
    Co-Authors: Maria-florina Balcan, Avrim Blum
    Abstract:

    Kernel Functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel Function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not directly correspond to one's intuition of a good kernel as a good Similarity Function. Furthermore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand since the implicit mapping may not be easy to calculate. Finally, the requirement of positive semi-definiteness may rule out the most natural pairwise Similarity Functions for the given problem domain.In this work we develop an alternative, more general theory of learning with Similarity Functions (i.e., sufficient conditions for a Similarity Function to allow one to learn well) that does not require reference to implicit spaces, and does not require the Function to be positive semi-definite (or even symmetric). Our results also generalize the standard theory in the sense that any good kernel Function under the usual definition can be shown to also be a good Similarity Function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels that describes the effectiveness of a given kernel Function in terms of natural Similarity-based properties.

Lusann Yang - One of the best experts on this subject based on the ideXlab platform.

  • Proposed definition of crystal substructure and substructural Similarity
    Physical Review B, 2014
    Co-Authors: Lusann Yang, Stephen Dacek, Gerbrand Ceder
    Abstract:

    There is a clear need for a practical and mathematically rigorous description of local structure in inorganic compounds so that structures and chemistries can be easily compared across large data sets. Here a method for decomposing crystal structures into substructures is given, and a Similarity Function between those substructures is defined. The Similarity Function is based on both geometric and chemical Similarity. This construction allows for large-scale data mining of substructural properties, and the analysis of substructures and void spaces within crystal structures. The method is validated via the prediction of Li-ion intercalation sites for the oxides. Tested on databases of known Li-ion-containing oxides, the method reproduces all Li-ion sites in an oxide with a maximum of 4 incorrect guesses 80% of the time.

  • data mined Similarity Function between material compositions
    Physical Review B, 2013
    Co-Authors: Lusann Yang, Gerbrand Ceder
    Abstract:

    A new method for assessing the Similarity of material compositions is described. A Similarity measure is important for the classification and clustering of compositions. The Similarity of the material compositions is calculated utilizing a data-mined ionic substitutional Similarity based upon the probability with which two ions will substitute for each other within the same structure prototype. The method is validated via the prediction of crystal structure prototypes for oxides from the Inorganic Crystal Structure Database, selecting the correct prototype from a list of known prototypes within five guesses 75% of the time. It performs particularly well on the quaternary oxides, selecting the correct prototype from a list of known prototypes on the first guess 65% of the time.

Elsa Dupraz - One of the best experts on this subject based on the ideXlab platform.

  • the p-value as a new Similarity Function for spectral clustering in Sensor networks
    2018
    Co-Authors: Mael Bompais, Hamza Ameur, Dominique Pastor, Elsa Dupraz
    Abstract:

    In this paper, we consider spectral clustering over data collected by a network of sensors. In this context, the spatial data distribution is not necessarily uniform and can further be affected by sensor noise. This is why we propose a new Similarity measure for spectral clustering in sensor networks. This Similarity Function is derived as the p-value of an hypothesis test that has to decide whether two sensor measurements belong to the same cluster. Unlike other existing Similarity measures, the p-value takes into account both the local data densities and the fact that the noise variance can vary from sensor to sensor. Simulation results show that the p-value leads to a better spectral clustering performance than the standard Gaussian kernel when there is some noise in the collected data.

  • SSP - The P-Value as a New Similarity Function for Spectral Clustering in Sensor Networks
    2018 IEEE Statistical Signal Processing Workshop (SSP), 2018
    Co-Authors: Mael Bompais, Hamza Ameur, Dominique Pastor, Elsa Dupraz
    Abstract:

    In this paper, we consider spectral clustering over data collected by a network of sensors. In this context, the spatial data distribution is not necessarily uniform and can further be affected by sensor noise. This is why we propose a new Similarity Function for spectral clustering in sensor networks. This Similarity Function is derived as the p-value of an hypothesis test that has to decide whether two sensor measurements belong to the same cluster. Unlike other existing Similarity Functions, the p-value takes into account both the local data densities and the fact that the noise variance can vary from sensor to sensor. Simulation results show that thep-value leads to a better spectral clustering performance than the standard Gaussian kernel when there is some noise in the collected data.