Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

paper

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

Journal of Advances in Information Fusion (JAIF)

Volume Number:

Issue Number:

Pages:

–

Publication Date:

1 June 2008

Return To Issue

R. Washburn, M. Schneider

paper Menu

Abstract

We present verifiable sufficient conditions for determining optimal policies for finite horizon, discrete time Markov decision problems (MDPs) with terminal reward. In particular, a control policy is optimal for the MDP if (i) it is optimal at the terminal time, (ii) immediate decisions can be deferred to future times, and (iii) the probability transition functions are commutative with respect to different decisions. The result applies to a class of finite horizon restless multiarmed bandit problems that arise in sensor management applications, which we illustrate with a pair of examples.