Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Pessimistic Off-Policy Optimization for Learning to Rank

Jun 06, 2022

Matej Cief, Branislav Kveton, Michal Kompan

Figure 1 for Pessimistic Off-Policy Optimization for Learning to Rank

Figure 2 for Pessimistic Off-Policy Optimization for Learning to Rank

Figure 3 for Pessimistic Off-Policy Optimization for Learning to Rank

Figure 4 for Pessimistic Off-Policy Optimization for Learning to Rank

Share this with someone who'll enjoy it:

Abstract:Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged much more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient and we analyze it. We study its Bayesian and frequentist variants, and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines, is robust, and is also general.

View paper on

Share this with someone who'll enjoy it:

Title:Pessimistic Off-Policy Optimization for Learning to Rank

Paper and Code