Sequential Decision Making

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 19227 Experts worldwide ranked by ideXlab platform

Paul Schrater - One of the best experts on this subject based on the ideXlab platform.

  • structure learning in human Sequential Decision Making
    PLOS Computational Biology, 2010
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in Sequential Decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.

  • structure learning in human Sequential Decision Making
    Neural Information Processing Systems, 2008
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    We use graphical models and structure learning to explore how people learn policies in Sequential Decision Making tasks. Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure.

  • NIPS - Structure Learning in Human Sequential Decision-Making
    2008
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    We use graphical models and structure learning to explore how people learn policies in Sequential Decision Making tasks. Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure.

John W Fisher - One of the best experts on this subject based on the ideXlab platform.

  • variational information planning for Sequential Decision Making
    International Conference on Artificial Intelligence and Statistics, 2019
    Co-Authors: Jason Pacheco, John W Fisher
    Abstract:

    We consider the setting of Sequential Decision Making where, at each stage, potential actions are evaluated based on expected reduction in posterior uncertainty, given by mutual information (MI). As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. Our planning objective extends an established variational bound on MI to the setting of Sequential planning. The result, variational information planning (VIP), is an efficient method for Sequential Decision Making. We further establish convexity of the variational planning objective and, under conditional exponential family approximations, we show that the optimal MI bound arises from a relaxation of the well-known exponential family moment matching property. We demonstrate VIP for sensor selection, experiment design, and active learning, where it meets or exceeds methods requiring more computation, or those specialized to the task.

  • AISTATS - Variational Information Planning for Sequential Decision Making
    2019
    Co-Authors: Jason Pacheco, John W Fisher
    Abstract:

    We consider the setting of Sequential Decision Making where, at each stage, potential actions are evaluated based on expected reduction in posterior uncertainty, given by mutual information (MI). As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. Our planning objective extends an established variational bound on MI to the setting of Sequential planning. The result, variational information planning (VIP), is an efficient method for Sequential Decision Making. We further establish convexity of the variational planning objective and, under conditional exponential family approximations, we show that the optimal MI bound arises from a relaxation of the well-known exponential family moment matching property. We demonstrate VIP for sensor selection, experiment design, and active learning, where it meets or exceeds methods requiring more computation, or those specialized to the task.

Daniel E Acuna - One of the best experts on this subject based on the ideXlab platform.

  • structure learning in human Sequential Decision Making
    PLOS Computational Biology, 2010
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in Sequential Decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.

  • structure learning in human Sequential Decision Making
    Neural Information Processing Systems, 2008
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    We use graphical models and structure learning to explore how people learn policies in Sequential Decision Making tasks. Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure.

  • NIPS - Structure Learning in Human Sequential Decision-Making
    2008
    Co-Authors: Daniel E Acuna, Paul Schrater
    Abstract:

    We use graphical models and structure learning to explore how people learn policies in Sequential Decision Making tasks. Studies of Sequential Decision-Making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure.

Jason Pacheco - One of the best experts on this subject based on the ideXlab platform.

  • variational information planning for Sequential Decision Making
    International Conference on Artificial Intelligence and Statistics, 2019
    Co-Authors: Jason Pacheco, John W Fisher
    Abstract:

    We consider the setting of Sequential Decision Making where, at each stage, potential actions are evaluated based on expected reduction in posterior uncertainty, given by mutual information (MI). As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. Our planning objective extends an established variational bound on MI to the setting of Sequential planning. The result, variational information planning (VIP), is an efficient method for Sequential Decision Making. We further establish convexity of the variational planning objective and, under conditional exponential family approximations, we show that the optimal MI bound arises from a relaxation of the well-known exponential family moment matching property. We demonstrate VIP for sensor selection, experiment design, and active learning, where it meets or exceeds methods requiring more computation, or those specialized to the task.

  • AISTATS - Variational Information Planning for Sequential Decision Making
    2019
    Co-Authors: Jason Pacheco, John W Fisher
    Abstract:

    We consider the setting of Sequential Decision Making where, at each stage, potential actions are evaluated based on expected reduction in posterior uncertainty, given by mutual information (MI). As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. Our planning objective extends an established variational bound on MI to the setting of Sequential planning. The result, variational information planning (VIP), is an efficient method for Sequential Decision Making. We further establish convexity of the variational planning objective and, under conditional exponential family approximations, we show that the optimal MI bound arises from a relaxation of the well-known exponential family moment matching property. We demonstrate VIP for sensor selection, experiment design, and active learning, where it meets or exceeds methods requiring more computation, or those specialized to the task.

Jürgen Schmidhuber - One of the best experts on this subject based on the ideXlab platform.

  • Sequential Decision Making based on direct search
    Lecture Notes in Computer Science, 2000
    Co-Authors: Jürgen Schmidhuber
    Abstract:

    The most challenging open issues in Sequential Decision Making include partial observability of the Decision maker's environment, hierarchical and other types of abstract credit assignment, the learning of credit assignment algorithms, and exploration without a priori world models. I will summarize why direct search (DS) in policy spade provides a more natural framework for addressing these issues than reinforcement learning (RL) based on value functions and dynamic programming. Then I will point out fundamental drawbacks of traditional DS methods in case of stochastic environments, stochastic policies, and unknown temporal delays between actions and observable effects. I will discuss a remedy called the sucess-story algorithm, show how it can outperform traditional DS, and mention a relationship to market models combining certain aspects of DS and traditional RL.

  • Sequence Learning - Sequential Decision Making Based on Direct Search
    Sequence Learning, 2000
    Co-Authors: Jürgen Schmidhuber
    Abstract:

    The most challenging open issues in Sequential Decision Making include partial observability of the Decision maker’s environment, hierarchical and other types of abstract credit assignment, the learning of credit assignment algorithms, and exploration without a priori world models. I will summarize why direct search (DS) in policy space provides a more natural framework for addressing these issues than reinforcement learning (RL) based on value functions and dynamic programming. Then I will point out fundamental drawbacks of traditional DS methods in case of stochastic environments, stochastic policies, and unknown temporal delays between actions and observable effects. I will discuss a remedy called the success-story algorithm, show how it can outperform traditional DS, and mention a relationship to market models combining certain aspects of DS and traditional RL.