Multi-Armed Bandit with Endogenous Learning and Queueing: An Application to Split Liver Transplantation

Abstract: Split liver transplantation (SLT) is a procedure that saves two lives by dividing one liver and transplanting the sections into two recipients. Despite SLT’s potential to relieve the acute shortage of donated livers in the US it is rarely used, in part because few surgeons in the US have learned to perform SLT. One barrier for young surgeons to acquire the skills to perform SLT is the need to perform actual SLT surgeries to become proficient, and the lower success rate such early surgeries have. Further, because SLT is a delicate operation, even with practice some medical teams may still have only mixed success. Capturing the many important facets of the problem, in a parsimonious yet sound model, is crucial if we hope to analyze it efficiently. To this end, we extended the classic multi-armed bandit (MAB) model, by embedding learning curves in the reward functions, to address the trade-off between discovering and developing talents (exploration) and utilizing a defined group of already-skilled surgeons (exploitation). To solve our MAB learning model, we propose the L-UCB, FL-UCB, and QFL-UCB algorithms, all variants of the upper confidence bound (UCB) algorithm, enhanced with additional features important to the SLT problem, such as queueing dynamics, fairness, and arm dependence. We prove that the regrets of our algorithms, that is, the loss in total rewards due to lack of information about surgeons’ aptitudes, are bounded by O(log T). We also show they have superior numerical performance compared to standard bandit algorithms in settings where learning and queueing exist. From a methodological point of view, our proposed MAB model and algorithms are generic and have broad application prospects. From an application standpoint, our algorithms could be applied to help evaluate potential strategies to increase the proliferation of SLT and other technically-difficult medical procedures.

Short Bio: Yanhan (Savannah) Tang is an Operations Management Ph.D. candidate at Tepper School of Business, Carnegie Mellon University. In her research, she solves emerging healthcare and financial services problems utilizing methods from fields including queueing theory, optimal control, and machine learning.