Customization of the Auer-Cesa-Bianchi-Fisher UCB Strategy for a Gaussian Two-Armed Bandit

Maxim Ershov; Alexander Kolnogorov; Albert Voroshilov

doi:10.57753/SMARTY.2023.40.11.001

Vol. 3 (2022), Research Article

Vol. 3 (2022)

Customization of the Auer-Cesa-Bianchi-Fisher UCB Strategy for a Gaussian Two-Armed Bandit

Research Article

https://doi.org/10.57753/SMARTY.2023.40.11.001

Published August 8, 2023

Maxim Ershov⁺⁻
Alexander Kolnogorov⁺⁻
Albert Voroshilov⁺⁻

Maxim Ershov

Yaroslav-the-Wise Novgorod State University, Bolshaya St. Petersburg str., 41, Veliky Novgorod, Russia

https://orcid.org/0000-0003-4900-278X

Alexander Kolnogorov

Yaroslav-the-Wise Novgorod State University, Bolshaya St. Petersburg str., 41, Veliky Novgorod, Russia

https://orcid.org/0000-0003-4203-8472

Albert Voroshilov

Yaroslav-the-Wise Novgorod State University, Bolshaya St. Petersburg str., 41, Veliky Novgorod, Russia

https://orcid.org/0000-0001-6588-6138

PDF

Keywords

Gaussian two-armed bandit
UCB strategies
Bayesian and minimax approaches
batch processing
Monte-Carlo simulations
dynamic programming

How to Cite

Ershov, M., Kolnogorov, A., & Voroshilov, A. (2023). Customization of the Auer-Cesa-Bianchi-Fisher UCB Strategy for a Gaussian Two-Armed Bandit. Stochastic Modeling and Applied Research of Technology, 3, 1-13. https://doi.org/10.57753/SMARTY.2023.40.11.001

Abstract

We consider a two-armed bandit problem in relation to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One has to determine the most efficient method and ensure its preferential use. We consider batch data processing when all the data is divided into batches. For the control, we present the batch version of the UCB strategy which was first introduced by P. Auer, N. Cesa-Bianchi and P. Fisher. We develop two approaches to the invariant description of the control process on the horizon equal to one. The first approach allows us to compute a regret using Monte-Carlo simulations and the second approach provides the analytical formalism for solving a recursive Bellman-type dynamic programming equation. Numerical results show the high efficiency of the presented strategy.

https://doi.org/10.57753/SMARTY.2023.40.11.001

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.

Customization of the Auer-Cesa-Bianchi-Fisher UCB Strategy for a Gaussian Two-Armed Bandit

Keywords

How to Cite

Download Citation

Abstract