Minimum Description Length

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9528 Experts worldwide ranked by ideXlab platform

Peter Grunwald - One of the best experts on this subject based on the ideXlab platform.

  • Minimum Description Length revisited
    International Journal of Mathematics for Industry, 2019
    Co-Authors: Peter Grunwald, Teemu Roos
    Abstract:

    This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine ...

  • Minimum Description Length revisited
    Mathematics for industry, 2019
    Co-Authors: Peter Grunwald, Teemu Roos
    Abstract:

    This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.

  • luckiness and regret in Minimum Description Length inference
    Handbook of the Philosophy of Science, 2011
    Co-Authors: Steven De Rooij, Peter Grunwald
    Abstract:

    Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used to implement MDL, focusing on the philosophically intriguing concepts of luckiness and regret: a good MDL code exhibits good performance in the worst case over all possible data sets, but achieves even better performance when the data turn out to be simple (although we suggest making no a priori assumptions to that effect). We then discuss how data compression relates to performance in various learning tasks, including parameter estimation, parametric and nonparametric model selection and sequential prediction of outcomes from an unknown source. Last, we briefly outline the history of MDL and its technical and philosophical relationship to other approaches to learning such as Bayesian, frequentist and prequential statistics.

  • the Minimum Description Length principle
    MIT Press Books, 2007
    Co-Authors: Peter Grunwald
    Abstract:

    The Minimum Description Length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern. This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology. Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.

  • advances in Minimum Description Length theory and applications
    2005
    Co-Authors: Peter Grunwald, In Jae Myung, Mark A Pitt
    Abstract:

    The process of inductive inference -- to infer general laws and principles from particular instances -- is the basis of statistical modeling, pattern recognition, and machine learning. The Minimum Descriptive Length (MDL) principle, a powerful method of inductive inference, holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data -- that the more we are able to compress the data, the more we learn about the regularities underlying the data. Advances in Minimum Description Length is a sourcebook that will introduce the scientific community to the foundations of MDL, recent theoretical advances, and practical applications.The book begins with an extensive tutorial on MDL, covering its theoretical underpinnings, practical implications as well as its various interpretations, and its underlying philosophy. The tutorial includes a brief history of MDL -- from its roots in the notion of Kolmogorov complexity to the beginning of MDL proper. The book then presents recent theoretical advances, introducing modern MDL methods in a way that is accessible to readers from many different scientific fields. The book concludes with examples of how to apply MDL in research settings that range from bioinformatics and machine learning to psychology.

Lawrence B. Holder - One of the best experts on this subject based on the ideXlab platform.

  • Substructure Discovery Using Minimum Description Length and Background Knowledge
    arXiv: Artificial Intelligence, 1994
    Co-Authors: Diane J. Cook, Lawrence B. Holder
    Abstract:

    The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the Minimum Description Length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical Description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the Minimum Description Length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.

  • substructure discovery using Minimum Description Length and background knowledge
    Journal of Artificial Intelligence Research, 1993
    Co-Authors: Diane J. Cook, Lawrence B. Holder
    Abstract:

    The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the Minimum Description Length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical Description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the MinimumDescription Length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain.

Jaakko Astola - One of the best experts on this subject based on the ideXlab platform.

  • Inference of Gene Regulatory Networks Based on a Universal Minimum Description Length
    EURASIP Journal on Bioinformatics and Systems Biology, 2008
    Co-Authors: John Dougherty, Ioan Tabus, Jaakko Astola
    Abstract:

    The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The Minimum Description Length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of Description Length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principle's implied universality. In order to surpass this difficulty, we propose a novel MDL-based method in which the Description Length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorov's structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to time-series Drosophila gene expression measurements.

Wenbin Liu - One of the best experts on this subject based on the ideXlab platform.

  • using the Minimum Description Length principle to reduce the rate of false positives of best fit algorithms
    Eurasip Journal on Bioinformatics and Systems Biology, 2014
    Co-Authors: Jie Fang, Edward R Dougherty, Hongjia Ouyang, Liangzhong Shen, Wenbin Liu
    Abstract:

    The inference of gene regulatory networks is a core problem in systems biology. Many inference algorithms have been proposed and all suffer from false positives. In this paper, we use the Minimum Description Length (MDL) principle to reduce the rate of false positives for best-fit algorithms. The performance of these algorithms is evaluated via two metrics: the normalized-edge Hamming distance and the steady-state distribution distance. Results for synthetic networks and a well-studied budding-yeast cell cycle network show that MDL-based filtering is more effective than filtering based on conditional mutual information (CMI). In addition, MDL-based filtering provides better inference than the MDL algorithm itself.

Tanya Y Bergerwolf - One of the best experts on this subject based on the ideXlab platform.

  • network model selection using task focused Minimum Description Length
    arXiv: Artificial Intelligence, 2017
    Co-Authors: Ivan Brugere, Tanya Y Bergerwolf
    Abstract:

    Networks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or their impacts. We propose a task-focused network model selection methodology which addresses several key challenges. Our approach constructs network models from underlying data and uses Minimum Description Length (MDL) criteria for selection. Our methodology measures efficiency, a general and comparable measure of the network's performance of a local (i.e. node-level) predictive task of interest. Selection on efficiency favors parsimonious (e.g. sparse) models to avoid overfitting and can be applied across arbitrary tasks and representations. We show stability, sensitivity, and significance testing in our methodology.