📝 Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards

Published in Pre-print, 2026

Recommended citation: https://drive.google.com/file/d/1d5bjPzPTV-u1idNil5g80IeTZwb54iww/view?usp=sharing

👥 Co-authors: Shubhada Agrawal

We consider the regret minimization problem in a stochastic multi-armed bandit setup, where the classical goal is to design policies that minimize the so-called expected cumulative regret. While expected-regret guarantees are well understood, controlling the tail behavior of the regret is crucial for their use in safety-critical applications. Recent works highlight that certain optimal algorithms can be fragile in the sense that they may incur large regret with non-negligible probability. However, current analysis has so far been confined mainly to simpler settings within the single parametric exponential family of reward distributions. Thus, the principal aim of our work is to analyze such regret-tail behavior of optimal bandit algorithms in a relatively broader setting: policies that are optimized for generic families of reward distributions under significantly weaker structural assumptions. For such general classes of reward distributions, we first show that a generalization of KLinf -UCB algorithm remains asymptotically optimal. We then analyze its regret tail behavior and show that, for the distribution class under consideration, optimal algorithms generally exhibit a weak fragility in general. However, when the additional condition of discrimination equivalence holds, this fragility intensifies to a strong form characterized by heavy (Cauchy-type) regret tails. Since several widely studied model classes, including moment-bounded and bounded-support distributions, are contained in our framework, these results imply that optimal bandit algorithms for such families are strongly fragile whenever discrimination equivalence is satisfied. Finally, in the absence of discrimination equivalence, we refine the regret-tail upper-bound analysis and establish that, for bounded and finitely supported distributions, the optimal algorithm exhibits only a strictly weaker, near-robust form of fragility. ⏩ Full Paper