Picture for Kaixuan Ji

Kaixuan Ji

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

Add code
Feb 09, 2025
Figure 1 for Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Viaarxiv icon

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Add code
Oct 11, 2024
Figure 1 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 2 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 3 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Figure 4 for Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Viaarxiv icon

Self-Play Preference Optimization for Language Model Alignment

Add code
May 01, 2024
Figure 1 for Self-Play Preference Optimization for Language Model Alignment
Figure 2 for Self-Play Preference Optimization for Language Model Alignment
Figure 3 for Self-Play Preference Optimization for Language Model Alignment
Figure 4 for Self-Play Preference Optimization for Language Model Alignment
Viaarxiv icon

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Add code
Feb 15, 2024
Figure 1 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 2 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 3 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Figure 4 for Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Viaarxiv icon

Reinforcement Learning from Human Feedback with Active Queries

Add code
Feb 14, 2024
Figure 1 for Reinforcement Learning from Human Feedback with Active Queries
Figure 2 for Reinforcement Learning from Human Feedback with Active Queries
Figure 3 for Reinforcement Learning from Human Feedback with Active Queries
Figure 4 for Reinforcement Learning from Human Feedback with Active Queries
Viaarxiv icon

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Add code
Jan 02, 2024
Figure 1 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 2 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 3 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Figure 4 for Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Viaarxiv icon

BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation

Add code
Oct 16, 2023
Figure 1 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 2 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 3 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Figure 4 for BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
Viaarxiv icon

Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment

Add code
Oct 16, 2023
Figure 1 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 2 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 3 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Figure 4 for Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Viaarxiv icon

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Add code
May 15, 2023
Viaarxiv icon

Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers

Add code
Jul 14, 2022
Figure 1 for Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers
Figure 2 for Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers
Figure 3 for Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers
Figure 4 for Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers
Viaarxiv icon