Picture for Shujian Zhang

Shujian Zhang

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Add code
Dec 31, 2025
Viaarxiv icon

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Add code
Dec 30, 2025
Viaarxiv icon

Eliciting Behaviors in Multi-Turn Conversations

Add code
Dec 29, 2025
Viaarxiv icon

Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling

Add code
Nov 19, 2025
Figure 1 for Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
Figure 2 for Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
Figure 3 for Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
Figure 4 for Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
Viaarxiv icon

Principled Foundations for Preference Optimization

Add code
Jul 10, 2025
Viaarxiv icon

T-REG: Preference Optimization with Token-Level Reward Regularization

Add code
Dec 03, 2024
Viaarxiv icon

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Add code
Oct 09, 2024
Figure 1 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 2 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 3 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 4 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Viaarxiv icon

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Add code
Oct 07, 2024
Figure 1 for SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Figure 2 for SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Figure 3 for SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Figure 4 for SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Viaarxiv icon

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

Add code
Sep 17, 2024
Figure 1 for Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Figure 2 for Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Figure 3 for Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Figure 4 for Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Viaarxiv icon

WPO: Enhancing RLHF with Weighted Preference Optimization

Add code
Jun 17, 2024
Viaarxiv icon