Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Jul 29, 2024

Shashank Gupta, Harrie Oosterhuis, Maarten de Rijke

Figure 1 for Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Figure 2 for Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Figure 3 for Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Figure 4 for Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Share this with someone who'll enjoy it:

Abstract:Counterfactual learning to rank (CLTR ) can be risky; various circumstances can cause it to produce sub-optimal models that hurt performance when deployed. Safe CLTR was introduced to mitigate these risks when using inverse propensity scoring to correct for position bias. However, the existing safety measure for CLTR is not applicable to state-of-the-art CLTR, it cannot handle trust bias, and its guarantees rely on specific assumptions about user behavior. Our contributions are two-fold. First, we generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust (DR) CLTR and trust bias. Second, we propose a novel approach, proximal ranking policy optimization (PRPO ), that provides safety in deployment without assumptions about user behavior. PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model. Thereby, PRPO imposes a limit on how much learned models can degrade performance metrics, without relying on any specific user assumptions. Our experiments show that both our novel safe doubly robust method and PRPO provide higher performance than the existing safe inverse propensity scoring approach. However, when circumstances are unexpected, the safe doubly robust approach can become unsafe and bring detrimental performance. In contrast, PRPO always maintains safety, even in maximally adversarial situations. By avoiding assumptions, PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.

* Full paper at CIKM 2024

View paper on

Share this with someone who'll enjoy it:

Title:Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank

Paper and Code