Abstract
We consider a two-armed bandit problem in relation to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One has to determine the most efficient method and ensure its preferential use. We consider batch data processing when all the data is divided into batches. For the control, we present the batch version of the UCB strategy which was first introduced by P. Auer, N. Cesa-Bianchi and P. Fisher. We develop two approaches to the invariant description of the control process on the horizon equal to one. The first approach allows us to compute a regret using Monte-Carlo simulations and the second approach provides the analytical formalism for solving a recursive Bellman-type dynamic programming equation. Numerical results show the high efficiency of the presented strategy.
This work is licensed under a Creative Commons Attribution 4.0 International License.