Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joseph Davidson

Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models

Sep 30, 2024

David Castillo-Bolado, Joseph Davidson, Finlay Gray, Marek Rosa

Abstract:We introduce a dynamic benchmarking system for conversational agents that evaluates their performance through a single, simulated, and lengthy user$\leftrightarrow$agent interaction. The interaction is a conversation between the user and agent, where multiple tasks are introduced and then undertaken concurrently. We context switch regularly to interleave the tasks, which constructs a realistic testing scenario in which we assess the Long-Term Memory, Continual Learning, and Information Integration capabilities of the agents. Results from both proprietary and open-source Large-Language Models show that LLMs in general perform well on single-task interactions, but they struggle on the same tasks when they are interleaved. Notably, short-context LLMs supplemented with an LTM system perform as well as or better than those with larger contexts. Our benchmark suggests that there are other challenges for LLMs responding to more natural interactions that contemporary benchmarks have heretofore not been able to capture.

* Accepted as a poster at NeurIPS D&B Track 2024

Via

Access Paper or Ask Questions

BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Dec 03, 2019

Marek Rosa, Olga Afanasjeva, Simon Andersson, Joseph Davidson, Nicholas Guttenberg, Petr Hlubuček, Martin Poliak, Jaroslav Vítku, Jan Feyereisl

Figure 1 for BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Figure 2 for BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Figure 3 for BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Figure 4 for BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Abstract:In this work, we propose a novel memory-based multi-agent meta-learning architecture and learning procedure that allows for learning of a shared communication policy that enables the emergence of rapid adaptation to new and unseen environments by learning to learn learning algorithms through communication. Behavior, adaptation and learning to adapt emerges from the interactions of homogeneous experts inside a single agent. The proposed architecture should allow for generalization beyond the level seen in existing methods, in part due to the use of a single policy shared by all experts within the agent as well as the inherent modularity of 'Badger'.

Via

Access Paper or Ask Questions

ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Apr 12, 2019

Jaroslav Vítků, Petr Dluhoš, Joseph Davidson, Matěj Nikl, Simon Andersson, Přemysl Paška, Jan Šinkora, Petr Hlubuček, Martin Stránský, Martin Hyben(+3 more)

Figure 1 for ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Figure 2 for ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Figure 3 for ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Figure 4 for ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Abstract:Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are usually uncomputable, incompatible with theories of biological intelligence, or lack practical implementations. The goal of this work is to combine the main advantages of the two: to follow a big picture view, while providing a particular theory and its implementation. In contrast with purely theoretical approaches, the resulting architecture should be usable in realistic settings, but also form the core of a framework containing all the basic mechanisms, into which it should be easier to integrate additional required functionality. In this paper, we present a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one's own actions on the world, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations with the following properties: 1) they are increasingly more abstract, but can retain details when needed, and 2) they are easy to manipulate in their local and symbolic-like form, thus also allowing one to observe the learning process at each level of abstraction. On all levels of the system, the representation of the data can be interpreted in both a symbolic and a sub-symbolic manner. This enables the architecture to learn efficiently using sub-symbolic methods and to employ symbolic inference.

* Revision: added paragraph in Appendix F with explanation, reformated tables so that they do not protrude into the next column, corrected English in Appendices

Via

Access Paper or Ask Questions