Picture for Mohammad Ghavamzadeh

Mohammad Ghavamzadeh

INRIA Lille - Nord Europe

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models

Add code
Jun 04, 2025
Viaarxiv icon

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Add code
May 24, 2025
Viaarxiv icon

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Add code
Apr 02, 2025
Viaarxiv icon

C-3DPO: Constrained Controlled Classification for Direct Preference Optimization

Add code
Feb 22, 2025
Viaarxiv icon

Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator

Add code
Feb 06, 2025
Viaarxiv icon

Conservative Contextual Bandits: Beyond Linear Representations

Add code
Dec 09, 2024
Viaarxiv icon

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Add code
Oct 31, 2024
Figure 1 for Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Figure 2 for Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Figure 3 for Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Figure 4 for Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Viaarxiv icon

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Add code
Apr 02, 2024
Figure 1 for Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Figure 2 for Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Figure 3 for Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Figure 4 for Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Viaarxiv icon

Contextual Bandits with Stage-wise Constraints

Add code
Jan 15, 2024
Viaarxiv icon

Maximum Entropy Model Correction in Reinforcement Learning

Add code
Nov 29, 2023
Viaarxiv icon