markov decision process

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 28869 Experts worldwide ranked by ideXlab platform

Shankar Sastry - One of the best experts on this subject based on the ideXlab platform.

  • markov decision process routing games
    International Conference on Cyber-Physical Systems, 2017
    Co-Authors: Dan Calderone, Shankar Sastry
    Abstract:

    We explore an extension of nonatomic routing games that we call markov decision process routing games where each agent chooses a transition policy between nodes in a network rather than a path from an origin node to a destination node, i.e. each agent in the population solves a markov decision process rather than a shortest path problem. We define the appropriate version of a Wardrop equilibrium as well as a potential function for this game in the finite horizon (total reward) case. This work can be thought of as a routing- game-based formulation of continuous population stochastic games (mean-field games or anonymous sequential games). We apply our model to the problem of ridesharing drivers competing for customers.

Thomas Dean - One of the best experts on this subject based on the ideXlab platform.

  • bounded parameter markov decision process
    Artificial Intelligence, 2000
    Co-Authors: Robert Givan, Sonia M Leach, Thomas Dean
    Abstract:

    In this paper, we introduce the notion of a {\em bounded parameter markov decision process\/} as a generalization of the traditional {\em exact\/} MDP. A bounded parameter MDP is a set of exact MDPs specified by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). Bounded parameter MDPs can be used to represent variation or uncertainty concerning the parameters of sequential decision problems. Bounded parameter MDPs can also be used in aggregation schemes to represent the variation in the transition probabilities for different base states aggregated together in the same aggregate state. We introduce {\em interval value functions\/} as a natural extension of traditional value functions. An interval value function assigns a closed real interval to each state, representing the assertion that the value of that state falls within that interval. An interval value function can be used to bound the performance of a policy over the set of exact MDPs associated with a given bounded parameter MDP. We describe an iterative dynamic programming algorithm called {\em interval policy evaluation\/} which computes an interval value function for a given bounded parameter MDP and specified policy. Interval policy evaluation on a policy $policy$ computes the most restrictive interval value function that is sound, i.e. that bounds the value function for $policy$ in every exact MDP in the set defined by the bounded parameter MDP. A simple modification of interval policy evaluation results in a variant of value iteration [Bellman57] that we call {\em interval value iteration\/} which computes a policy for an bounded parameter MDP that is optimal in a well-defined sense.

Howard Jay Siegel - One of the best experts on this subject based on the ideXlab platform.

  • a partially observable markov decision process approach to residential home energy management
    IEEE Transactions on Smart Grid, 2018
    Co-Authors: Timothy M Hansen, Edwin K. P. Chong, Siddharth Suryanarayanan, Anthony A Maciejewski, Howard Jay Siegel
    Abstract:

    Real-time pricing (RTP) is a utility-offered dynamic pricing program to incentivize customers to make changes in their energy usage. A home energy management system (HEMS) automates the energy usage in a smart home in response to utility pricing signals. We present three new HEMS techniques—one myopic approach and two non-myopic partially observable markov decision process (POMDP) approaches—for minimizing the household electricity bill in such a RTP market. In a simulation study, we compare the performance of the new HEMS methods with a mathematical lower bound and the status quo. We show that the non-myopic POMDP approach can provide a 10%–30% saving over the status quo.

Dan Calderone - One of the best experts on this subject based on the ideXlab platform.

  • markov decision process routing games
    International Conference on Cyber-Physical Systems, 2017
    Co-Authors: Dan Calderone, Shankar Sastry
    Abstract:

    We explore an extension of nonatomic routing games that we call markov decision process routing games where each agent chooses a transition policy between nodes in a network rather than a path from an origin node to a destination node, i.e. each agent in the population solves a markov decision process rather than a shortest path problem. We define the appropriate version of a Wardrop equilibrium as well as a potential function for this game in the finite horizon (total reward) case. This work can be thought of as a routing- game-based formulation of continuous population stochastic games (mean-field games or anonymous sequential games). We apply our model to the problem of ridesharing drivers competing for customers.

Finale Doshivelez - One of the best experts on this subject based on the ideXlab platform.

  • the infinite partially observable markov decision process
    Neural Information Processing Systems, 2009
    Co-Authors: Finale Doshivelez
    Abstract:

    The Partially Observable markov decision process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems.