Picture for Kaixuan Ji

Kaixuan Ji

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Add code
Mar 02, 2026
Viaarxiv icon

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

Add code
Feb 09, 2025
Figure 1 for Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Viaarxiv icon

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Add code
Oct 11, 2024
Figure 1 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 2 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 3 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 4 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Viaarxiv icon

Self-Play Preference Optimization for Language Model Alignment

Add code
May 01, 2024
Figure 1 for Self-Play Preference Optimization for Language Model Alignment
Figure 2 for Self-Play Preference Optimization for Language Model Alignment
Figure 3 for Self-Play Preference Optimization for Language Model Alignment
Figure 4 for Self-Play Preference Optimization for Language Model Alignment
Viaarxiv icon

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Add code
Feb 15, 2024
Figure 1 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 2 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 3 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 4 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Viaarxiv icon

Reinforcement Learning from Human Feedback with Active Queries

Add code
Feb 14, 2024
Figure 1 for Reinforcement Learning from Human Feedback with Active Queries
Figure 2 for Reinforcement Learning from Human Feedback with Active Queries
Figure 3 for Reinforcement Learning from Human Feedback with Active Queries
Figure 4 for Reinforcement Learning from Human Feedback with Active Queries
Viaarxiv icon

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Add code
Jan 02, 2024
Figure 1 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 2 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 3 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 4 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Viaarxiv icon

Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment

Add code
Oct 16, 2023
Figure 1 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 2 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 3 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 4 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Viaarxiv icon

BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation

Add code
Oct 16, 2023
Figure 1 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 2 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 3 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 4 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Viaarxiv icon

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Add code
May 15, 2023
Viaarxiv icon