Intrinsic Dimensionality

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 6546 Experts worldwide ranked by ideXlab platform

Erich Schubert - One of the best experts on this subject based on the ideXlab platform.

Atsushi Nitanda - One of the best experts on this subject based on the ideXlab platform.

  • deep learning is adaptive to Intrinsic Dimensionality of model smoothness in anisotropic besov space
    arXiv: Machine Learning, 2019
    Co-Authors: Taiji Suzuki, Atsushi Nitanda
    Abstract:

    Deep learning has exhibited superior performance for various tasks, especially for high-dimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on {\it anisotropic Besov spaces}. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigated thus far. We demonstrate that the approximation error and estimation error of deep learning only depend on the average value of the smoothness parameters in all directions. Consequently, the curse of Dimensionality can be avoided if the smoothness of the target function is highly anisotropic. Unlike existing studies, our analysis does not require a low-dimensional structure of the input data. We also investigate the minimax optimality of deep learning and compare its performance with that of the kernel method (more generally, linear estimators). The results show that deep learning has better dependence on the input Dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Erik Thordsen - One of the best experts on this subject based on the ideXlab platform.

  • abid angle based Intrinsic Dimensionality
    Similarity Search and Applications, 2020
    Co-Authors: Erik Thordsen, Erich Schubert
    Abstract:

    The Intrinsic Dimensionality refers to the “true” Dimensionality of the data, as opposed to the Dimensionality of the data representation. For example, when attributes are highly correlated, the Intrinsic Dimensionality can be much lower than the number of variables. Local Intrinsic Dimensionality refers to the observation that this property can vary for different parts of the data set; and Intrinsic Dimensionality can serve as a proxy for the local difficulty of the data set.

  • abid angle based Intrinsic Dimensionality
    arXiv: Machine Learning, 2020
    Co-Authors: Erik Thordsen, Erich Schubert
    Abstract:

    The Intrinsic Dimensionality refers to the ``true'' Dimensionality of the data, as opposed to the Dimensionality of the data representation. For example, when attributes are highly correlated, the Intrinsic Dimensionality can be much lower than the number of variables. Local Intrinsic Dimensionality refers to the observation that this property can vary for different parts of the data set; and Intrinsic Dimensionality can serve as a proxy for the local difficulty of the data set. Most popular methods for estimating the local Intrinsic Dimensionality are based on distances, and the rate at which the distances to the nearest neighbors increase, a concept known as ``expansion dimension''. In this paper we introduce an orthogonal concept, which does not use any distances: we use the distribution of angles between neighbor points. We derive the theoretical distribution of angles and use this to construct an estimator for Intrinsic Dimensionality. Experimentally, we verify that this measure behaves similarly, but complementarily, to existing measures of Intrinsic Dimensionality. By introducing a new idea of Intrinsic Dimensionality to the research community, we hope to contribute to a better understanding of Intrinsic Dimensionality and to spur new research in this direction.

Michael E Houle - One of the best experts on this subject based on the ideXlab platform.

  • high Intrinsic Dimensionality facilitates adversarial attack theoretical evidence
    IEEE Transactions on Information Forensics and Security, 2021
    Co-Authors: Laurent Amsaleg, Michael E Houle, James Bailey, Teddy Furon, Sarah M Erfani, Milos Radovanovic, Amelie Barbe, Xuan Vinh Nguyen
    Abstract:

    Machine learning systems are vulnerable to adversarial attack. By applying to the input object a small, carefully-designed perturbation, a classifier can be tricked into making an incorrect prediction. This phenomenon has drawn wide interest, with many attempts made to explain it. However, a complete understanding is yet to emerge. In this paper we adopt a slightly different perspective, still relevant to classification. We consider retrieval, where the output is a set of objects most similar to a user-supplied query object, corresponding to the set of $k$ -nearest neighbors. We investigate the effect of adversarial perturbation on the ranking of objects with respect to a query. Through theoretical analysis, supported by experiments, we demonstrate that as the Intrinsic Dimensionality of the data domain rises, the amount of perturbation required to subvert neighborhood rankings diminishes, and the vulnerability to adversarial attack rises. We examine two modes of perturbation of the query: either ‘closer’ to the target point, or ‘farther’ from it. We also consider two perspectives: ‘query-centric’, examining the effect of perturbation on the query’s own neighborhood ranking, and ‘target-centric’, considering the ranking of the query point in the target’s neighborhood set. All four cases correspond to practical scenarios involving classification and retrieval.

  • local Intrinsic Dimensionality iii density and similarity
    Similarity Search and Applications, 2020
    Co-Authors: Michael E Houle
    Abstract:

    In artificial intelligence, machine learning, and other areas in which statistical estimation and modeling is common, distributions are typically assumed to admit a representation in terms of a probability density function (pdf). However, in many situations, such as mixture modeling and subspace methods, the distributions in question are not always describable in terms of a single pdf. In this paper, we present a theoretical foundation for the modeling of density ratios in terms of the local Intrinsic Dimensionality (LID) model, in a way that avoids the use of traditional probability density functions. These formulations provide greater flexibility when modeling data under the assumption of local variation in Intrinsic Dimensionality, in that no explicit dependence on a fixed-dimensional data representation is required.

  • quality evaluation of gans using cross local Intrinsic Dimensionality
    arXiv: Learning, 2019
    Co-Authors: Sukarna Barua, Michael E Houle, Sarah M Erfani, James Bailey
    Abstract:

    Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an Intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local Intrinsic Dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality.

  • on the correlation between local Intrinsic Dimensionality and outlierness
    Similarity Search and Applications, 2018
    Co-Authors: Michael E Houle, Erich Schubert, Arthur Zimek
    Abstract:

    Data mining methods for outlier detection are usually based on non-parametric density estimates in various variations. Here we argue for the use of local Intrinsic Dimensionality as a measure of outlierness and demonstrate empirically that it is a meaningful alternative and complement to classic methods.

  • lid fingerprint a local Intrinsic Dimensionality based fingerprinting method
    Similarity Search and Applications, 2018
    Co-Authors: Michael E Houle, Vincent Oria, Arwa Wali, Kurt Rohloff
    Abstract:

    One of the most important information hiding techniques is fingerprinting, which aims to generate new representations for data that are significantly more compact than the original. Fingerprinting is a promising technique for secure and efficient similarity search for multimedia data on the cloud. In this paper, we propose LID-Fingerprint, a simple binary fingerprinting technique for high-dimensional data. The binary fingerprints are derived from sparse representations of the data objects, which are generated using a feature selection criterion, Support-Weighted Intrinsic Dimensionality (support-weighted ID), within a similarity graph construction method, NNWID-Descent. The sparsification process employed by LID-Fingerprint significantly reduces the information content of the data, thus ensuring data suppression and data masking. Experimental results show that LID-Fingerprint is able to generate compact binary fingerprints while allowing a reasonable level of search accuracy.

Paola Campadelli - One of the best experts on this subject based on the ideXlab platform.

  • danco an Intrinsic Dimensionality estimator exploiting angle and norm concentration
    Pattern Recognition, 2014
    Co-Authors: Claudio Ceruti, Alessandro Rozza, Gabriele Lombardi, Elena Casiraghi, Simone Bassis, Paola Campadelli
    Abstract:

    Abstract In the past decade the development of automatic Intrinsic Dimensionality estimators has gained considerable attention due to its relevance in several application fields. However, most of the proposed solutions prove to be not robust on noisy datasets, and provide unreliable results when the Intrinsic Dimensionality of the input dataset is high and the manifold where the points are assumed to lie is nonlinearly embedded in a higher dimensional space. In this paper we propose a novel Intrinsic Dimensionality estimator ( DANCo ) and its faster variant ( FastDANCo ), which exploit the information conveyed both by the normalized nearest neighbor distances and by the angles computed on couples of neighboring points. The effectiveness and robustness of the proposed algorithms are assessed by experiments on synthetic and real datasets, by the comparative evaluation with state-of-the-art methodologies, and by significance tests.

  • local Intrinsic Dimensionality based features for clustering
    International Conference on Image Analysis and Processing, 2013
    Co-Authors: Paola Campadelli, Claudio Ceruti, Gabriele Lombardi, Elena Casiraghi, Alessandro Rozza
    Abstract:

    One of the fundamental tasks of unsupervised learning is dataset clustering, to partition the input dataset into clusters composed by somehow “similar” objects that “differ” from the objects belonging to other classes. To this end, in this paper we assume that the different clusters are drawn from different, possibly intersecting, geometrical structures represented by manifolds embedded into a possibly higher dimensional space. Under these assumptions, and considering that each manifold is typified by a geometrical structure characterized by its Intrinsic Dimensionality, which (possibly) differs from the Intrinsic dimensionalities of other manifolds, we code the input data by means of local Intrinsic Dimensionality estimates and features related to them, and we subsequently apply simple and basic clustering algorithms, since our interest is specifically aimed at assessing the discriminative power of the proposed features. Indeed, their encouraging discriminative quality is shown by a feature relevance test, by the clustering results achieved on both synthetic and real datasets, and by their comparison to those obtained by related and classical state-of-the-art clustering approaches.

  • novel high Intrinsic Dimensionality estimators
    Machine Learning, 2012
    Co-Authors: Alessandro Rozza, Gabriele Lombardi, Claudio Ceruti, Elena Casiraghi, Paola Campadelli
    Abstract:

    Recently, a great deal of research work has been devoted to the development of algorithms to estimate the Intrinsic Dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any Dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions. Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high id.

  • a novel Intrinsic Dimensionality estimator based on rank order statistics
    Revised Selected Papers of the First International Workshop on Clustering High--Dimensional Data - Volume 7627, 2012
    Co-Authors: Simone Bassis, Alessandro Rozza, Gabriele Lombardi, Claudio Ceruti, Elena Casiraghi, Paola Campadelli
    Abstract:

    In the past two decades the estimation of the Intrinsic Dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective Intrinsic Dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust Intrinsic Dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.