## Speaker:

**Abstract:** We consider a periodic-review single-product inventory system with fixed cost under censored demand. Under full demand distributional information, it is well-known that the celebrated $(s,S)$ policy is optimal. In this paper, we assume the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decision in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. Compared with prior literature, the key difficulty of this problem lies in the loss of joint convexity of the objective function, due to the presence of fixed cost. We develop a nonparametric learning algorithm termed the $(\delta, S)$ policy that combines the powers of stochastic gradient descent, bandit controls, and simulation-based methods in a seamless and non-trivial fashion. We prove that the cumulative regret is $O(\log T\sqrt{T})$, which is provably tight up to a logarithmic factor. We also develop several technical results that are of independent interest. We believe that the framework developed could be widely applied to learning other important stochastic systems with partial convexity in the objectives.

**Bio:** Cong Shi is an associate professor of IOE at University of Michigan. His research is focused on the design of efficient algorithms with theoretical performance guarantees for stochastic optimization models in operations management. Main areas of applications include inventory control, supply chain management, revenue management, and service operations. He received his Ph.D. in Operations Research at MIT in 2012, and his B.S. in Mathematics from the National University of Singapore in 2007. He won the first place in the INFORMS George Nicholson Student Paper Competition 2009, and the third place in the INFORMS Junior Faculty Interest Group (JFIG) Paper Competition 2017.