Picture for Anikait Singh

Anikait Singh

Personalized Preference Fine-tuning of Diffusion Models

Add code
Jan 11, 2025
Viaarxiv icon

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Add code
Jan 08, 2025
Viaarxiv icon

Test-Time Alignment via Hypothesis Reweighting

Add code
Dec 11, 2024
Viaarxiv icon

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

Add code
Oct 03, 2024
Viaarxiv icon

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Add code
Oct 17, 2023
Figure 1 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 2 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 3 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 4 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Viaarxiv icon

Robotic Offline RL from Internet Videos via Value-Function Pre-Training

Add code
Sep 22, 2023
Viaarxiv icon

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Add code
Jul 28, 2023
Viaarxiv icon

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Add code
Mar 09, 2023
Viaarxiv icon