Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patrick John Chia

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

Feb 08, 2024

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou

Abstract:Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NegotiationArena to assess LLM's behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). Each scenario allows for multiple turns of flexible dialogues between LLM agents to allow for more complex negotiations. Interestingly, LLM agents can significantly boost their negotiation outcomes by employing certain behavioral tactics. For example, by pretending to be desolate and desperate, LLMs can improve their payoffs by 20\% when negotiating against the standard GPT-4. We also quantify irrational negotiation behaviors exhibited by the LLM agents, many of which also appear in humans. Together, \NegotiationArena offers a new environment to investigate LLM interactions, enabling new insights into LLM's theory of mind, irrationality, and reasoning abilities.

Via

Access Paper or Ask Questions

E Pluribus Unum: Guidelines on Multi-Objective Evaluation of Recommender Systems

Apr 20, 2023

Patrick John Chia, Giuseppe Attanasio, Jacopo Tagliabue, Federico Bianchi, Ciro Greco, Gabriel de Souza P. Moreira, Davide Eynard, Fahd Husain

Abstract:Recommender Systems today are still mostly evaluated in terms of accuracy, with other aspects beyond the immediate relevance of recommendations, such as diversity, long-term user retention and fairness, often taking a back seat. Moreover, reconciling multiple performance perspectives is by definition indeterminate, presenting a stumbling block to those in the pursuit of rounded evaluation of Recommender Systems. EvalRS 2022 -- a data challenge designed around Multi-Objective Evaluation -- was a first practical endeavour, providing many insights into the requirements and challenges of balancing multiple objectives in evaluation. In this work, we reflect on EvalRS 2022 and expound upon crucial learnings to formulate a first-principles approach toward Multi-Objective model selection, and outline a set of guidelines for carrying out a Multi-Objective Evaluation challenge, with potential applicability to the problem of rounded evaluation of competing models in real-world deployments.

* 15 pages, under submission

Via

Access Paper or Ask Questions

EvalRS 2023. Well-Rounded Recommender Systems For Real-World Deployments

Apr 19, 2023

Federico Bianchi, Patrick John Chia, Ciro Greco, Claudio Pomo, Gabriel Moreira, Davide Eynard, Fahd Husain, Jacopo Tagliabue

Abstract:EvalRS aims to bring together practitioners from industry and academia to foster a debate on rounded evaluation of recommender systems, with a focus on real-world impact across a multitude of deployment scenarios. Recommender systems are often evaluated only through accuracy metrics, which fall short of fully characterizing their generalization capabilities and miss important aspects, such as fairness, bias, usefulness, informativeness. This workshop builds on the success of last year's workshop at CIKM, but with a broader scope and an interactive format.

* EvalRS 2023 will be a workshop hosted at KDD23

Via

Access Paper or Ask Questions

EvalRS: a Rounded Evaluation of Recommender Systems

Jul 12, 2022

Jacopo Tagliabue, Federico Bianchi, Tobias Schnabel, Giuseppe Attanasio, Ciro Greco, Gabriel de Souza P. Moreira, Patrick John Chia

Figure 1 for EvalRS: a Rounded Evaluation of Recommender Systems

Abstract:Much of the complexity of Recommender Systems (RSs) comes from the fact that they are used as part of more complex applications and affect user experience through a varied range of user interfaces. However, research focused almost exclusively on the ability of RSs to produce accurate item rankings while giving little attention to the evaluation of RS behavior in real-world scenarios. Such narrow focus has limited the capacity of RSs to have a lasting impact in the real world and makes them vulnerable to undesired behavior, such as reinforcing data biases. We propose EvalRS as a new type of challenge, in order to foster this discussion among practitioners and build in the open new methodologies for testing RSs "in the wild".

* CIKM 2022 Data Challenge Paper

Via

Access Paper or Ask Questions

FashionCLIP: Connecting Language and Images for Product Representations

Apr 11, 2022

Patrick John Chia, Giuseppe Attanasio, Federico Bianchi, Silvia Terragni, Ana Rita Magalhães, Diogo Goncalves, Ciro Greco, Jacopo Tagliabue

Figure 1 for FashionCLIP: Connecting Language and Images for Product Representations

Figure 2 for FashionCLIP: Connecting Language and Images for Product Representations

Figure 3 for FashionCLIP: Connecting Language and Images for Product Representations

Figure 4 for FashionCLIP: Connecting Language and Images for Product Representations

Abstract:The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from more transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model for the fashion industry. We showcase its capabilities for retrieval, classification and grounding, and release our model and code to the community.

* Code will soon be available at https://github.com/patrickjohncyh, dataset at https://github.com/Farfetch

Via

Access Paper or Ask Questions

"Does it come in black?" CLIP-like models are zero-shot recommenders

Apr 11, 2022

Patrick John Chia, Jacopo Tagliabue, Federico Bianchi, Ciro Greco, Diogo Goncalves

Figure 1 for "Does it come in black?" CLIP-like models are zero-shot recommenders

Figure 2 for "Does it come in black?" CLIP-like models are zero-shot recommenders

Figure 3 for "Does it come in black?" CLIP-like models are zero-shot recommenders

Figure 4 for "Does it come in black?" CLIP-like models are zero-shot recommenders

Abstract:Product discovery is a crucial component for online shopping. However, item-to-item recommendations today do not allow users to explore changes along selected dimensions: given a query item, can a model suggest something similar but in a different color? We consider item recommendations of the comparative nature (e.g. "something darker") and show how CLIP-based models can support this use case in a zero-shot manner. Leveraging a large model built for fashion, we introduce GradREC and its industry potential, and offer a first rounded assessment of its strength and weaknesses.

* Accepted at ACL 2022 (ECNLP)

Via

Access Paper or Ask Questions

Beyond NDCG: behavioral testing of recommender systems with RecList

Nov 18, 2021

Patrick John Chia, Jacopo Tagliabue, Federico Bianchi, Chloe He, Brian Ko

Figure 1 for Beyond NDCG: behavioral testing of recommender systems with RecList

Figure 2 for Beyond NDCG: behavioral testing of recommender systems with RecList

Figure 3 for Beyond NDCG: behavioral testing of recommender systems with RecList

Figure 4 for Beyond NDCG: behavioral testing of recommender systems with RecList

Abstract:As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and deployment-specific tests must be employed to ensure the desired quality in actual deployments. In this paper, we propose RecList, a behavioral-based testing methodology. RecList organizes recommender systems by use case and introduces a general plug-and-play procedure to scale up behavioral testing. We demonstrate its capabilities by analyzing known algorithms and black-box commercial systems, and we release RecList as an open source, extensible package for the community.

* Alpha draft

Via

Access Paper or Ask Questions

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Jul 08, 2021

Patrick John Chia, Bingqing Yu, Jacopo Tagliabue

Figure 1 for "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Figure 2 for "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Figure 3 for "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Figure 4 for "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Abstract:Large eCommerce players introduced comparison tables as a new type of recommendations. However, building comparisons at scale without pre-existing training/taxonomy data remains an open challenge, especially within the operational constraints of shops in the long tail. We present preliminary results from building a comparison pipeline designed to scale in a multi-shop scenario: we describe our design choices and run extensive benchmarks on multiple shops to stress-test it. Finally, we run a small user study on property selection and conclude by discussing potential improvements and highlighting the questions that remain to be addressed.

* Accepted for publication at SIGIR eCom 2021

Via

Access Paper or Ask Questions

SIGIR 2021 E-Commerce Workshop Data Challenge

Apr 27, 2021

Jacopo Tagliabue, Ciro Greco, Jean-Francis Roy, Bingqing Yu, Patrick John Chia, Federico Bianchi, Giovanni Cassani

Figure 1 for SIGIR 2021 E-Commerce Workshop Data Challenge

Figure 2 for SIGIR 2021 E-Commerce Workshop Data Challenge

Abstract:The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.

* SIGIR eCOM 2021 Data Challenge (pre-print)

Via

Access Paper or Ask Questions