Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oussama Zekri

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Feb 03, 2025

Oussama Zekri, Nicolas Boullé

Figure 1 for Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Figure 2 for Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Figure 3 for Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Figure 4 for Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Abstract:Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available at https://github.com/ozekri/SEPO

* 23 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions

Zero-shot Model-based Reinforcement Learning using Large Language Models

Oct 15, 2024

Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl

Figure 1 for Zero-shot Model-based Reinforcement Learning using Large Language Models

Figure 2 for Zero-shot Model-based Reinforcement Learning using Large Language Models

Figure 3 for Zero-shot Model-based Reinforcement Learning using Large Language Models

Figure 4 for Zero-shot Model-based Reinforcement Learning using Large Language Models

Abstract:The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.

Via

Access Paper or Ask Questions

Large Language Models as Markov Chains

Oct 03, 2024

Oussama Zekri, Ambroise Odonnat, Abdelhakim Benechehab, Linus Bleistein, Nicolas Boullé, Ievgen Redko

Figure 1 for Large Language Models as Markov Chains

Figure 2 for Large Language Models as Markov Chains

Figure 3 for Large Language Models as Markov Chains

Figure 4 for Large Language Models as Markov Chains

Abstract:Large language models (LLMs) have proven to be remarkably efficient, both across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the origins of their impressive performance remains elusive. In this paper, we approach this challenging task by drawing an equivalence between generic autoregressive language models with vocabulary of size $T$ and context window of size $K$ and Markov chains defined on a finite state space of size $\mathcal{O}(T^K)$. We derive several surprising findings related to the existence of a stationary distribution of Markov chains that capture the inference power of LLMs, their speed of convergence to it, and the influence of the temperature on the latter. We then prove pre-training and in-context generalization bounds and show how the drawn equivalence allows us to enrich their interpretation. Finally, we illustrate our theoretical guarantees with experiments on several recent LLMs to highlight how they capture the behavior observed in practice.

* 49 pages, 17 figures

Via

Access Paper or Ask Questions

Can LLMs predict the convergence of Stochastic Gradient Descent?

Aug 03, 2024

Oussama Zekri, Abdelhakim Benechehab, Ievgen Redko

Abstract:Large-language models are notoriously famous for their impressive performance across a wide range of tasks. One surprising example of such impressive performance is a recently identified capacity of LLMs to understand the governing principles of dynamical systems satisfying the Markovian property. In this paper, we seek to explore this direction further by studying the dynamics of stochastic gradient descent in convex and non-convex optimization. By leveraging the theoretical link between the SGD and Markov chains, we show a remarkable zero-shot performance of LLMs in predicting the local minima to which SGD converges for previously unseen starting points. On a more general level, we inquire about the possibility of using LLMs to perform zero-shot randomized trials for larger deep learning models used in practice.

* 9 pages. Accepted to 1st ICML Workshop on In-Context Learning at ICML 2024

Via

Access Paper or Ask Questions