Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Bewley

Voxtral

Jul 17, 2025

Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy(+96 more)

Abstract:We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.

* 17 pages

Via

Access Paper or Ask Questions

Zero-Shot Reinforcement Learning Under Partial Observability

Jun 18, 2025

Scott Jeen, Tom Bewley, Jonathan M. Cullen

Abstract:Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.

* Reinforcement Learning Conference 2025

Via

Access Paper or Ask Questions

Sequential Harmful Shift Detection Without Labels

Dec 17, 2024

Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso

Figure 1 for Sequential Harmful Shift Detection Without Labels

Figure 2 for Sequential Harmful Shift Detection Without Labels

Figure 3 for Sequential Harmful Shift Detection Without Labels

Figure 4 for Sequential Harmful Shift Detection Without Labels

Abstract:We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time. Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error. This proxy is derived using the predictions of a trained error estimator. Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.

* Accepted at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Via

Access Paper or Ask Questions

Interpreting Language Reward Models via Contrastive Explanations

Nov 25, 2024

Junqi Jiang, Tom Bewley, Saumitra Mishra, Freddy Lecue, Manuela Veloso

Figure 1 for Interpreting Language Reward Models via Contrastive Explanations

Figure 2 for Interpreting Language Reward Models via Contrastive Explanations

Figure 3 for Interpreting Language Reward Models via Contrastive Explanations

Figure 4 for Interpreting Language Reward Models via Contrastive Explanations

Abstract:Reward models (RMs) are a crucial component in the alignment of large language models' (LLMs) outputs with human values. RMs approximate human preferences over possible LLM responses to the same prompt by predicting and comparing reward scores. However, as they are typically modified versions of LLMs with scalar output heads, RMs are large black boxes whose predictions are not explainable. More transparent RMs would enable improved trust in the alignment of LLMs. In this work, we propose to use contrastive explanations to explain any binary response comparison made by an RM. Specifically, we generate a diverse set of new comparisons similar to the original one to characterise the RM's local behaviour. The perturbed responses forming the new comparisons are generated to explicitly modify manually specified high-level evaluation attributes, on which analyses of RM behaviour are grounded. In quantitative experiments, we validate the effectiveness of our method for finding high-quality contrastive explanations. We then showcase the qualitative usefulness of our method for investigating global sensitivity of RMs to each evaluation attribute, and demonstrate how representative examples can be automatically extracted to explain and compare behaviours of different RMs. We see our method as a flexible framework for RM explanation, providing a basis for more interpretable and trustworthy LLM alignment.

Via

Access Paper or Ask Questions

Counterfactual Metarules for Local and Global Recourse

May 29, 2024

Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso

Figure 1 for Counterfactual Metarules for Local and Global Recourse

Figure 2 for Counterfactual Metarules for Local and Global Recourse

Figure 3 for Counterfactual Metarules for Local and Global Recourse

Figure 4 for Counterfactual Metarules for Local and Global Recourse

Abstract:We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that T-CREx achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.

* Accepted at ICML 2024

Via

Access Paper or Ask Questions

Conservative World Models

Sep 26, 2023

Scott Jeen, Tom Bewley, Jonathan M. Cullen

Abstract:Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline pre-training phase. Forward-backward (FB) representations represent remarkable progress towards this ideal, achieving 85% of the performance of task-specific agents in this setting. However, such performance is contingent on access to large and diverse datasets for pre-training, which cannot be expected for most real problems. Here, we explore how FB performance degrades when trained on small datasets that lack diversity, and mitigate it with conservatism, a well-established feature of performant offline RL algorithms. We evaluate our family of methods across various datasets, domains and tasks, reaching 150% of vanilla FB performance in aggregate. Somewhat surprisingly, conservative FB algorithms also outperform the task-specific baseline, despite lacking access to reward labels and being required to maintain policies for all tasks. Conservative FB algorithms perform no worse than FB on full datasets, and so present little downside over their predecessor. Our code is available open-source via https://enjeeneer.io/projects/conservative-world-models/.

* Project page: https://enjeeneer.io/projects/conservative-world-models/

Via

Access Paper or Ask Questions

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

May 26, 2023

Tom Bewley, Jonathan Lawry, Arthur Richards

Figure 1 for Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Figure 2 for Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Figure 3 for Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Figure 4 for Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Abstract:We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations.

* arXiv admin note: substantial text overlap with arXiv:2210.01007

Via

Access Paper or Ask Questions

Reward Learning with Trees: Methods and Evaluation

Oct 03, 2022

Tom Bewley, Jonathan Lawry, Arthur Richards, Rachel Craddock, Ian Henderson

Figure 1 for Reward Learning with Trees: Methods and Evaluation

Figure 2 for Reward Learning with Trees: Methods and Evaluation

Figure 3 for Reward Learning with Trees: Methods and Evaluation

Figure 4 for Reward Learning with Trees: Methods and Evaluation

Abstract:Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment. We explore the merits of learning intrinsically interpretable tree models instead. We develop a recently proposed method for learning reward trees from preference labels, and show it to be broadly competitive with neural networks on challenging high-dimensional tasks, with good robustness to limited or corrupted data. Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used, demonstrating that the interpretable reward structure gives significant scope for traceability, verification and explanation.

* 22 pages (9 main body). Preprint, under review

Via

Access Paper or Ask Questions

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

May 30, 2022

Joseph Early, Tom Bewley, Christine Evers, Sarvapali Ramchurn

Figure 1 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Figure 2 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Figure 3 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Figure 4 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Abstract:We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards. Existing work assumes that human evaluators observe each step in a trajectory independently when providing feedback on agent behaviour. In this work, we remove this assumption, extending RM to include hidden state information that captures temporal dependencies in human assessment of trajectories. We then show how RM can be approached as a multiple instance learning (MIL) problem, and develop new MIL models that are able to capture the time dependencies in labelled trajectories. We demonstrate on a range of RL tasks that our novel MIL models can reconstruct reward functions to a high level of accuracy, and that they provide interpretable learnt hidden information that can be used to train high-performing agent policies.

* 20 pages (9 main content; 2 references; 9 appendix). 11 figures (8 main content; 3 appendix)

Via

Access Paper or Ask Questions

Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Jan 17, 2022

Tom Bewley, Jonathan Lawry, Arthur Richards

Figure 1 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Figure 2 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Figure 3 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Figure 4 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Abstract:We introduce a data-driven, model-agnostic technique for generating a human-interpretable summary of the salient points of contrast within an evolving dynamical system, such as the learning process of a control agent. It involves the aggregation of transition data along both spatial and temporal dimensions according to an information-theoretic divergence measure. A practical algorithm is outlined for continuous state spaces, and deployed to summarise the learning histories of deep reinforcement learning agents with the aid of graphical and textual communication methods. We expect our method to be complementary to existing techniques in the realm of agent interpretability.

* 13 pages (6 body, 1 references, 6 appendix). Pre-print; under review

Via

Access Paper or Ask Questions