Policy Rollout Action Selection with Knowledge Gradient for Sensor Path Planning

paper Menu


This paper considers the problem of finding the best action in a policy rollout algorithm. Policy rollout is an online computation method used in approximate dynamic programming. We applied two different versions of the knowledge gradient (KG) policy to a sensor path planning problem. The goal of this problem is to localize an emitter using only bearing measurements. To the authors’ knowledge, this was the first time the KG was applied in a policy rollout context. The performance of the KG policy was found to be comparable with methods used in prior work while also having a potentially wider applicability.