Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bruno Brandão

InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

Feb 17, 2025

Bryan L. M. de Oliveira, Luana G. B. Martins, Bruno Brandão, Luckeciano C. Melo

Abstract:While large language models excel at following explicit instructions, they often struggle with ambiguous or incomplete user requests, defaulting to verbose, generic responses rather than seeking clarification. We introduce InfoQuest, a multi-turn chat benchmark designed to evaluate how dialogue agents handle hidden context in open-ended user requests. The benchmark presents intentionally ambiguous scenarios that require models to engage in information-seeking dialogue through clarifying questions before providing appropriate responses. Our evaluation of both open and closed-source models reveals that while proprietary models generally perform better, all current assistants struggle with effectively gathering critical information, often requiring multiple turns to infer user intent and frequently defaulting to generic responses without proper clarification. We provide a systematic methodology for generating diverse scenarios and evaluating models' information-seeking capabilities, offering insights into the current limitations of language models in handling ambiguous requests through multi-turn interactions.

Via

Access Paper or Ask Questions

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Oct 17, 2024

Bryan L. M. de Oliveira, Murilo L. da Luz, Bruno Brandão, Luana G. B. Martins, Telma W. de L. Soares, Luckeciano C. Melo

Figure 1 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Figure 2 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Figure 3 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Figure 4 for Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Abstract:Learning effective visual representations is crucial in open-world environments where agents encounter diverse and unstructured observations. This ability enables agents to extract meaningful information from raw sensory inputs, like pixels, which is essential for generalization across different tasks. However, evaluating representation learning separately from policy learning remains a challenge in most reinforcement learning (RL) benchmarks. To address this, we introduce the Sliding Puzzles Gym (SPGym), a benchmark that extends the classic 15-tile puzzle with variable grid sizes and observation spaces, including large real-world image datasets. SPGym allows scaling the representation learning challenge while keeping the latent environment dynamics and algorithmic problem fixed, providing a targeted assessment of agents' ability to form compositional and generalizable state representations. Experiments with both model-free and model-based RL algorithms, with and without explicit representation learning components, show that as the representation challenge scales, SPGym effectively distinguishes agents based on their capabilities. Moreover, SPGym reaches difficulty levels where no tested algorithm consistently excels, highlighting key challenges and opportunities for advancing representation learning for decision-making research.

Via

Access Paper or Ask Questions

MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Sep 30, 2020

Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, Sandor Caetano

Figure 1 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Figure 2 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Figure 3 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Figure 4 for MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces

Abstract:Recommender Systems are especially challenging for marketplaces since they must maximize user satisfaction while maintaining the healthiness and fairness of such ecosystems. In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments. For this matter, we propose MARS-Gym, an open-source framework to empower researchers and engineers to quickly build and evaluate Reinforcement Learning agents for recommendations in marketplaces. MARS-Gym addresses the whole development pipeline: data processing, model design and optimization, and multi-sided evaluation. We also provide the implementation of a diverse set of baseline agents, with a metrics-driven analysis of them in the Trivago marketplace dataset, to illustrate how to conduct a holistic assessment using the available metrics of recommendation, off-policy estimation, and fairness. With MARS-Gym, we expect to bridge the gap between academic research and production systems, as well as to facilitate the design of new algorithms and applications.

* 15 pages, 14 figures, see https://github.com/deeplearningbrasil/mars-gym

Via

Access Paper or Ask Questions