Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Praveen Chandar

Model Selection for Production System via Automated Online Experiments

May 27, 2021

Zhenwen Dai, Praveen Chandar, Ghazal Fazelnia, Ben Carterette, Mounia Lalmas-Roelleke

Figure 1 for Model Selection for Production System via Automated Online Experiments

Figure 2 for Model Selection for Production System via Automated Online Experiments

Figure 3 for Model Selection for Production System via Automated Online Experiments

Figure 4 for Model Selection for Production System via Automated Online Experiments

Abstract:A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.

* NeurIPS 2020

Via

Access Paper or Ask Questions

Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Aug 24, 2020

James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, Ben Carterette

Figure 1 for Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Figure 2 for Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Figure 3 for Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Figure 4 for Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

Abstract:Users of music streaming, video streaming, news recommendation, and e-commerce services often engage with content in a sequential manner. Providing and evaluating good sequences of recommendations is therefore a central problem for these services. Prior reweighting-based counterfactual evaluation methods either suffer from high variance or make strong independence assumptions about rewards. We propose a new counterfactual estimator that allows for sequential interactions in the rewards with lower variance in an asymptotically unbiased manner. Our method uses graphical assumptions about the causal relationships of the slate to reweight the rewards in the logging policy in a way that approximates the expected sum of rewards under the target policy. Extensive experiments in simulation and on a live recommender system show that our approach outperforms existing methods in terms of bias and data efficiency for the sequential track recommendations problem.

Via

Access Paper or Ask Questions

Methods for Individual Treatment Assignment: An Application and Comparison for Playlist Generation

May 09, 2020

Carlos Fernández-Loría, Foster Provost, Jesse Anderton, Benjamin Carterette, Praveen Chandar

Figure 1 for Methods for Individual Treatment Assignment: An Application and Comparison for Playlist Generation

Figure 2 for Methods for Individual Treatment Assignment: An Application and Comparison for Playlist Generation

Figure 3 for Methods for Individual Treatment Assignment: An Application and Comparison for Playlist Generation

Figure 4 for Methods for Individual Treatment Assignment: An Application and Comparison for Playlist Generation

Abstract:We present a systematic analysis of causal treatment assignment decision making, a general problem that arises in many applications and has received significant attention from economists, computer scientists, and social scientists. We focus on choosing, for each user, the best algorithm for playlist generation in order to optimize engagement. We characterize the various methods proposed in the literature into three general approaches: learning models to predict outcomes, learning models to predict causal effects, and learning models to predict optimal treatment assignments. We show analytically that optimizing for outcome or causal-effect prediction is not the same as optimizing for treatment assignments, and thus we should prefer learning models that optimize for treatment assignments. For our playlist generation application, we compare and contrast the three approaches empirically. This is the first comparison of the different treatment assignment approaches on a real-world application at scale (based on more than half a billion individual treatment assignments). Our results show (i) that applying different algorithms to different users can improve streams substantially compared to deploying the same algorithm for everyone, (ii) that personalized assignments improve substantially with larger data sets, and (iii) that learning models by optimizing treatment assignments rather than outcome or causal-effect predictions can improve treatment assignment performance by more than 28%.

Via

Access Paper or Ask Questions

The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

Mar 17, 2020

David Holtz, Benjamin Carterette, Praveen Chandar, Zahra Nazari, Henriette Cramer, Sinan Aral

Figure 1 for The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

Figure 2 for The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

Figure 3 for The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

Figure 4 for The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

Abstract:It remains unknown whether personalized recommendations increase or decrease the diversity of content people consume. We present results from a randomized field experiment on Spotify testing the effect of personalized recommendations on consumption diversity. In the experiment, both control and treatment users were given podcast recommendations, with the sole aim of increasing podcast consumption. Treatment users' recommendations were personalized based on their music listening history, whereas control users were recommended popular podcasts among users in their demographic group. We find that, on average, the treatment increased podcast streams by 28.90%. However, the treatment also decreased the average individual-level diversity of podcast streams by 11.51%, and increased the aggregate diversity of podcast streams by 5.96%, indicating that personalized recommendations have the potential to create patterns of consumption that are homogenous within and diverse across users, a pattern reflecting Balkanization. Our results provide evidence of an "engagement-diversity trade-off" when recommendations are optimized solely to drive consumption: while personalized recommendations increase user engagement, they also affect the diversity of consumed content. This shift in consumption diversity can affect user retention and lifetime value, and impact the optimal strategy for content producers. We also observe evidence that our treatment affected streams from sections of Spotify's app not directly affected by the experiment, suggesting that exposure to personalized recommendations can affect the content that users consume organically. We believe these findings highlight the need for academics and practitioners to continue investing in personalization methods that explicitly take into account the diversity of content recommended.

Via

Access Paper or Ask Questions