Stochastic Modeling and Applied Research of Technology

PDF

Keywords

Gaussian two-armed bandit
UCB strategies
Bayesian and minimax approaches
batch processing
Monte-Carlo simulations
dynamic programming

How to Cite

Ershov, M., Kolnogorov, A., & Voroshilov, A. (2023). Customization of the Auer-Cesa-Bianchi-Fisher UCB Strategy for a Gaussian Two-Armed Bandit. Stochastic Modeling and Applied Research of Technology, 3, 1-13. https://doi.org/10.57753/SMARTY.2023.40.11.001

Abstract

We consider a two-armed bandit problem in relation to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One has to determine the most efficient method and ensure its preferential use. We consider batch data processing when all the data is divided into batches. For the control, we present the batch version of the UCB strategy which was first introduced by P. Auer, N. Cesa-Bianchi and P. Fisher. We develop two approaches to the invariant description of the control process on the horizon equal to one. The first approach allows us to compute a regret using Monte-Carlo simulations and the second approach provides the analytical formalism for solving a recursive Bellman-type dynamic programming equation. Numerical results show the high efficiency of the presented strategy.

https://doi.org/10.57753/SMARTY.2023.40.11.001
PDF
Creative Commons Attribution License Logo

This work is licensed under a Creative Commons Attribution 4.0 International License.