Picture for Ruiqi Zhang

Ruiqi Zhang

How Do LLMs Perform Two-Hop Reasoning in Context?

Add code
Feb 19, 2025
Viaarxiv icon

Fast Best-of-N Decoding via Speculative Rejection

Add code
Oct 26, 2024
Viaarxiv icon

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Add code
Oct 09, 2024
Viaarxiv icon

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

Add code
Apr 08, 2024
Viaarxiv icon

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

Add code
Feb 24, 2024
Figure 1 for Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Figure 2 for Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Figure 3 for Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Figure 4 for Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Viaarxiv icon

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Add code
Feb 22, 2024
Viaarxiv icon

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

Add code
Feb 18, 2024
Viaarxiv icon

Spreeze: High-Throughput Parallel Reinforcement Learning Framework

Add code
Dec 11, 2023
Figure 1 for Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Figure 2 for Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Figure 3 for Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Figure 4 for Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Viaarxiv icon

Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface

Add code
Aug 07, 2023
Viaarxiv icon

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Add code
Jul 10, 2023
Viaarxiv icon