Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.
Rakesh Chawla, Andrea Rizzi, Matthias Finger, Federica Legger, Matteo Galli, Sun Hee Kim, Jian Zhao, João Miguel das Neves Duarte, Tagir Aushev, Hua Zhang, Alexis Kalogeropoulos, Yixing Chen, Tian Cheng, Ioannis Papadopoulos, Gabriele Grosso, Valérie Scheurer, Meng Xiao, Qian Wang, Michele Bianco, Varun Sharma, Joao Varela, Sourav Sen, Ashish Sharma, Seungkyu Ha, David Vannerom, Csaba Hajdu, Sanjeev Kumar, Sebastiana Gianì, Kun Shi, Abhisek Datta, Siyuan Wang, Anton Petrov, Jian Wang, Yi Zhang, Muhammad Ansar Iqbal, Yong Yang, Xin Sun, Muhammad Ahmad, Donghyun Kim, Matthias Wolf, Anna Mascellani, Paolo Ronchese, , , , , , , , , , , , , , , , , , , , , , , ,