Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Bakus

Amazon Search

Toward Understanding Privileged Features Distillation in Learning-to-Rank

Sep 19, 2022

Shuo Yang, Sujay Sanghavi, Holakou Rahmanian, Jan Bakus, S. V. N. Vishwanathan

Figure 1 for Toward Understanding Privileged Features Distillation in Learning-to-Rank

Figure 2 for Toward Understanding Privileged Features Distillation in Learning-to-Rank

Figure 3 for Toward Understanding Privileged Features Distillation in Learning-to-Rank

Figure 4 for Toward Understanding Privileged Features Distillation in Learning-to-Rank

Abstract:In learning-to-rank problems, a privileged feature is one that is available during model training, but not available at test time. Such features naturally arise in merchandised recommendation systems; for instance, "user clicked this item" as a feature is predictive of "user purchased this item" in the offline data, but is clearly not available during online serving. Another source of privileged features is those that are too expensive to compute online but feasible to be added offline. Privileged features distillation (PFD) refers to a natural idea: train a "teacher" model using all features (including privileged ones) and then use it to train a "student" model that does not use the privileged features. In this paper, we first study PFD empirically on three public ranking datasets and an industrial-scale ranking problem derived from Amazon's logs. We show that PFD outperforms several baselines (no-distillation, pretraining-finetuning, self-distillation, and generalized distillation) on all these datasets. Next, we analyze why and when PFD performs well via both empirical ablation studies and theoretical analysis for linear models. Both investigations uncover an interesting non-monotone behavior: as the predictive power of a privileged feature increases, the performance of the resulting student model initially increases but then decreases. We show the reason for the later decreasing performance is that a very predictive privileged teacher produces predictions with high variance, which lead to high variance student estimates and inferior testing performance.

* Accepted by NeurIPS 2022

Via

Access Paper or Ask Questions

PLAtE: A Large-scale Dataset for List Page Web Extraction

May 24, 2022

Aidan San, Jan Bakus, Colin Lockard, David Ciemiewicz, Yangfeng Ji, Sandeep Atluri, Kevin Small, Heba Elfardy

Figure 1 for PLAtE: A Large-scale Dataset for List Page Web Extraction

Figure 2 for PLAtE: A Large-scale Dataset for List Page Web Extraction

Figure 3 for PLAtE: A Large-scale Dataset for List Page Web Extraction

Figure 4 for PLAtE: A Large-scale Dataset for List Page Web Extraction

Abstract:Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites. However, a barrier for continued progress is the small number of datasets large enough to train these models. In this work, we introduce the PLAtE (Pages of Lists Attribute Extraction) dataset as a challenging new web extraction task. PLAtE focuses on shopping data, specifically extractions from product review pages with multiple items. PLAtE encompasses both the tasks of: (1) finding product-list segmentation boundaries and (2) extracting attributes for each product. PLAtE is composed of 53, 905 items from 6, 810 pages, making it the first large-scale list page web extraction dataset. We construct PLAtE by collecting list pages from Common Crawl, then annotating them on Mechanical Turk. Quantitative and qualitative analyses are performed to demonstrate PLAtE has high-quality annotations. We establish strong baseline performance on PLAtE with a SOTA model achieving an F1-score of 0.750 for attribute classification and 0.915 for segmentation, indicating opportunities for future research innovations in web extraction.

Via

Access Paper or Ask Questions

Analysis of E-commerce Ranking Signals via Signal Temporal Logic

Jan 14, 2021

Tommaso Dreossi, Giorgio Ballardin, Parth Gupta, Jan Bakus, Yu-Hsiang Lin, Vamsi Salaka

Figure 1 for Analysis of E-commerce Ranking Signals via Signal Temporal Logic

Figure 2 for Analysis of E-commerce Ranking Signals via Signal Temporal Logic

Figure 3 for Analysis of E-commerce Ranking Signals via Signal Temporal Logic

Figure 4 for Analysis of E-commerce Ranking Signals via Signal Temporal Logic

Abstract:The timed position of documents retrieved by learning to rank models can be seen as signals. Signals carry useful information such as drop or rise of documents over time or user behaviors. In this work, we propose to use the logic formalism called Signal Temporal Logic (STL) to characterize document behaviors in ranking accordingly to the specified formulas. Our analysis shows that interesting document behaviors can be easily formalized and detected thanks to STL formulas. We validate our idea on a dataset of 100K product signals. Through the presented framework, we uncover interesting patterns, such as cold start, warm start, spikes, and inspect how they affect our learning to ranks models.

* EPTCS 331, 2021, pp. 33-42
* In Proceedings SNR 2020, arXiv:2101.05256

Via

Access Paper or Ask Questions