Picture for Tuo Zhao

Tuo Zhao

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Add code
Mar 07, 2025
Viaarxiv icon

LLMs Can Generate a Better Answer by Aggregating Their Own Responses

Add code
Mar 06, 2025
Viaarxiv icon

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

Add code
Mar 04, 2025
Viaarxiv icon

COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs

Add code
Feb 26, 2025
Viaarxiv icon

Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

Add code
Feb 25, 2025
Viaarxiv icon

Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

Add code
Oct 12, 2024
Figure 1 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 2 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 3 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Figure 4 for Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Viaarxiv icon

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Add code
Sep 16, 2024
Figure 1 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 2 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 3 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Figure 4 for Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Viaarxiv icon

Robust Reinforcement Learning from Corrupted Human Feedback

Add code
Jun 21, 2024
Viaarxiv icon

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Add code
Jun 16, 2024
Viaarxiv icon

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Add code
Jun 04, 2024
Viaarxiv icon