Abstract:Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and transformer-based NLI models have thus far failed to offer effective, practical alternatives. The current work investigates if input size is a limiting factor, and shows that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.
Abstract:The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.
Abstract:Knowing if a user is a buyer vs window shopper solely based on clickstream data is of crucial importance for ecommerce platforms seeking to implement real-time accurate NBA (next best action) policies. However, due to the low frequency of conversion events and the noisiness of browsing data, classifying user sessions is very challenging. In this paper, we address the clickstream classification problem in the fashion industry and present three major contributions to the burgeoning field of AI in fashion: first, we collected, normalized and prepared a novel dataset of live shopping sessions from a major European e-commerce fashion website; second, we use the dataset to test in a controlled environment strong baselines and SOTA models from the literature; finally, we propose a new discriminative neural model that outperforms neural architectures recently proposed at Rakuten labs.