Abstract: Motivated by the consideration of fairly sharing the cost of exploration between multiple groups in learning problems, we develop the Nash bargaining solution in the context of multi-armed bandits. Specifically, the ‘grouped’ bandit associated with any multi-armed bandit problem associates, with each time step, a single group from some finite set of groups. The utility gained by a given group under some learning policy is naturally viewed as the reduction in that group’s regret relative to the regret that group would have incurred ‘on its own’. We derive policies that yield the Nash bargaining solution relative to the set of incremental utilities possible under any policy. We show that on the one hand, the ‘price of fairness’ under such policies is limited, while on the other hand, regret optimal policies are arbitrarily unfair under generic conditions. Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.
Paper: https://arxiv.org/abs/2106.02553
Bio: Jackie Baek is a PhD student in the Operations Research Center at MIT, where she is advised by Vivek Farias. Her research focuses on developing models and algorithms for decision making problems under uncertainty, focusing on practical concerns such as fairness. She completed her undergraduate studies at the University of Waterloo, where she majored in computer science and combinatorics and optimization. She has interned at several tech companies, including Dropbox, Snap, Bloomberg, and Grail.