Paper
Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management
Publication
Volume Number:
3
Issue Number:
1
Pages:
Starting page
3
Ending page
13
Publication Date:
Publication Date
June 2008
paper Menu
Abstract
We present verifiable sufficient conditions for determining optimal policies for finite horizon, discrete time Markov decision problems (MDPs) with terminal reward. In particular, a control policy is optimal for the MDP if (i) it is optimal at the terminal time, (ii) immediate decisions can be deferred to future times, and (iii) the probability transition functions are commutative with respect to different decisions. The result applies to a class of finite horizon restless multiarmed bandit problems that arise in sensor management applications, which we illustrate with a pair of examples.