High Dimensionality

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 250755 Experts worldwide ranked by ideXlab platform

Heggere S Ranganath - One of the best experts on this subject based on the ideXlab platform.

  • an analysis of time series representation methods data mining applications perspective
    ACM Southeast Regional Conference, 2014
    Co-Authors: Vineetha Bettaiah, Heggere S Ranganath
    Abstract:

    Because of High Dimensionality, proven data mining and pattern recognition methods are not suitable for processing time series data. As a result, several time series representations capable of achieving significant reduction in Dimensionality without losing important features have been developed. Each representation has its own advantages and disadvantages. In this paper, based on the requirements of key data mining applications, such as clustering, classification and query by content, characteristics desired in an ideal time series representation are identified. Using the identified characteristics as metrics, widely known time series representation methods are evaluated to determine the extent to which the representations satisfy the requirements.

  • ACM Southeast Regional Conference - An analysis of time series representation methods: data mining applications perspective
    Proceedings of the 2014 ACM Southeast Regional Conference on - ACM SE '14, 2014
    Co-Authors: Vineetha Bettaiah, Heggere S Ranganath
    Abstract:

    Because of High Dimensionality, proven data mining and pattern recognition methods are not suitable for processing time series data. As a result, several time series representations capable of achieving significant reduction in Dimensionality without losing important features have been developed. Each representation has its own advantages and disadvantages. In this paper, based on the requirements of key data mining applications, such as clustering, classification and query by content, characteristics desired in an ideal time series representation are identified. Using the identified characteristics as metrics, widely known time series representation methods are evaluated to determine the extent to which the representations satisfy the requirements.

Arnak S Dalalyan - One of the best experts on this subject based on the ideXlab platform.

  • tight conditions for consistency of variable selection in the context of High Dimensionality
    Annals of Statistics, 2012
    Co-Authors: Laetitia Comminges, Arnak S Dalalyan
    Abstract:

    We address the issue of variable selection in the regression model with very High ambient dimension, \textit{i.e.}, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension and denoted by $d^*$, is much smaller than the ambient dimension $d$. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. The asymptotic analysis reveals the presence of two quite different re\-gimes. The first regime is when $d^*$ is fixed. In this case the situation in nonparametric regression is the same as in linear regression, \textit{i.e.}, consistent variable selection is possible if and only if $\log d$ is small compared to the sample size $n$. The picture is different in the second regime, $d^*\to\infty$ as $n\to\infty$, where we prove that consistent variable selection in nonparametric set-up is possible only if $d^*+\log\log d$ is small compared to $\log n$. We apply these results to derive minimax separation rates for the problem of variable selection.

  • tight conditions for consistency of variable selection in the context of High Dimensionality
    arXiv: Statistics Theory, 2011
    Co-Authors: Laetitia Comminges, Arnak S Dalalyan
    Abstract:

    We address the issue of variable selection in the regression model with very High ambient dimension, that is, when the number of variables is very large. The main focus is on the situation where the number of relevant variables, called intrinsic dimension, is much smaller than the ambient dimension d. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is based on comparing quadratic functionals of the empirical Fourier coefficients with appropriately chosen threshold values. The asymptotic analysis reveals the presence of two quite different re gimes. The first regime is when the intrinsic dimension is fixed. In this case the situation in nonparametric regression is the same as in linear regression, that is, consistent variable selection is possible if and only if log d is small compared to the sample size n. The picture is different in the second regime, that is, when the number of relevant variables denoted by s tends to infinity as $n\to\infty$. Then we prove that consistent variable selection in nonparametric set-up is possible only if s+loglog d is small compared to log n. We apply these results to derive minimax separation rates for the problem of variable

Vineetha Bettaiah - One of the best experts on this subject based on the ideXlab platform.

  • an analysis of time series representation methods data mining applications perspective
    ACM Southeast Regional Conference, 2014
    Co-Authors: Vineetha Bettaiah, Heggere S Ranganath
    Abstract:

    Because of High Dimensionality, proven data mining and pattern recognition methods are not suitable for processing time series data. As a result, several time series representations capable of achieving significant reduction in Dimensionality without losing important features have been developed. Each representation has its own advantages and disadvantages. In this paper, based on the requirements of key data mining applications, such as clustering, classification and query by content, characteristics desired in an ideal time series representation are identified. Using the identified characteristics as metrics, widely known time series representation methods are evaluated to determine the extent to which the representations satisfy the requirements.

  • ACM Southeast Regional Conference - An analysis of time series representation methods: data mining applications perspective
    Proceedings of the 2014 ACM Southeast Regional Conference on - ACM SE '14, 2014
    Co-Authors: Vineetha Bettaiah, Heggere S Ranganath
    Abstract:

    Because of High Dimensionality, proven data mining and pattern recognition methods are not suitable for processing time series data. As a result, several time series representations capable of achieving significant reduction in Dimensionality without losing important features have been developed. Each representation has its own advantages and disadvantages. In this paper, based on the requirements of key data mining applications, such as clustering, classification and query by content, characteristics desired in an ideal time series representation are identified. Using the identified characteristics as metrics, widely known time series representation methods are evaluated to determine the extent to which the representations satisfy the requirements.

Raj P. Gopalan - One of the best experts on this subject based on the ideXlab platform.

  • Australian Conference on Artificial Intelligence - Clustering transactional data streams
    Lecture Notes in Computer Science, 2006
    Co-Authors: Yanrong Li, Raj P. Gopalan
    Abstract:

    The challenge of mining data streams is three fold. Firstly, an algorithm for a particular data mining task is subject to the sequential one-pass constraint; secondly, it must work under bounded resources such as memory and disk space; thirdly, it should have capabilities to answer time-sensitive queries. Dealing with transactional data streams is even more challenging due to their High Dimensionality and sparseness. In this paper, algorithms for clustering transactional data streams are proposed by incorporating the incremental clustering algorithm INCLUS into the equal-width time window model and the elastic time window model. These algorithms can efficiently cluster a transactional data stream in one pass and answer time sensitive queries at different granularities with limited resources.

Wiktor Koźminski - One of the best experts on this subject based on the ideXlab platform.

  • applications of High Dimensionality experiments to biomolecular nmr
    Progress in Nuclear Magnetic Resonance Spectroscopy, 2015
    Co-Authors: Michal Nowakowski, Saurabh Saxena, Jan Stanek, Szymon żerko, Wiktor Koźminski
    Abstract:

    High Dimensionality NMR experiments facilitate resonance assignment and precise determination of spectral parameters such as coupling constants. Sparse non-uniform sampling enables acquisition of experiments of High Dimensionality with High resolution in acceptable time. In this review we present and compare some significant applications of NMR experiments of Dimensionality Higher than three in the field of biomolecular studies in solution.

  • High Dimensionality 13c direct detected nmr experiments for the automatic assignment of intrinsically disordered proteins
    Journal of Biomolecular NMR, 2013
    Co-Authors: Wolfgang Bermel, Wiktor Koźminski, Isabella C Felli, Leonardo Gonnelli, Alessandro Piai, Roberta Pierattelli, Anna Zawadzkakazimierczuk
    Abstract:

    We present three novel exclusively heteronuclear 5D 13C direct-detected NMR experiments, namely (HN-flipN)CONCACON, (HCA)CONCACON and (H)CACON(CA)CON, designed for easy sequence-specific resonance assignment of intrinsically disordered proteins (IDPs). The experiments proposed have been optimized to overcome the drawbacks which may dramatically complicate the characterization of IDPs by NMR, namely the small dispersion of chemical shifts and the fast exchange of the amide protons with the solvent. A fast and reliable automatic assignment of α-synuclein chemical shifts was obtained with the Tool for SMFT-based Assignment of Resonances (TSAR) program based on the information provided by these experiments.

  • tsar a program for automatic resonance assignment using 2d cross sections of High Dimensionality High resolution spectra
    Journal of Biomolecular NMR, 2012
    Co-Authors: Anna Zawadzkakazimierczuk, Wiktor Koźminski, Martin Billeter
    Abstract:

    While NMR studies of proteins typically aim at structure, dynamics or interactions, resonance assignments represent in almost all cases the initial step of the analysis. With increasing complexity of the NMR spectra, for example due to decreasing extent of ordered structure, this task often becomes both difficult and time-consuming, and the recording of High-dimensional data with High-resolution may be essential. Random sampling of the evolution time space, combined with sparse multidimensional Fourier transform (SMFT), allows for efficient recording of very High dimensional spectra (≥4 dimensions) while maintaining High resolution. However, the nature of this data demands for automation of the assignment process. Here we present the program TSAR (Tool for SMFT-based Assignment of Resonances), which exploits all advantages of SMFT input. Moreover, its flexibility allows to process data from any type of experiments that provide sequential connectivities. The algorithm was tested on several protein samples, including a disordered 81-residue fragment of the δ subunit of RNA polymerase from Bacillus subtilis containing various repetitive sequences. For our test examples, TSAR achieves a High percentage of assigned residues without any erroneous assignments.

  • two dimensional fourier transform of arbitrarily sampled nmr data sets
    Journal of Magnetic Resonance, 2006
    Co-Authors: Krzysztof Kazimierczuk, Wiktor Koźminski, Igor Zhukov
    Abstract:

    A new procedure for Fourier transform with respect to more than one time variable simultaneously is proposed for NMR data processing. In the case of two-dimensional transform the spectrum is calculated for pairs of frequencies, instead of conventional sequence of one-dimensional transforms. Therefore, it enables one to Fourier transform arbitrarily sampled time domain and thus allows for analysis of High Dimensionality spectra acquired in a short time. The proposed method is not limited to radial sampling, it requires only to fulfill the Nyquist theorem considering two or more time domains at the same time. We show the application of new approach to the 3D HNCO spectrum acquired for protein sample with radial and spiral time domain sampling.