Picture for Jinman Zhao

Jinman Zhao

University of Toronto

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models

Add code
Feb 01, 2026
Viaarxiv icon

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Add code
Jan 30, 2026
Viaarxiv icon

$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences

Add code
Oct 08, 2025
Figure 1 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 2 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 3 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 4 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Viaarxiv icon

Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Add code
Sep 08, 2025
Viaarxiv icon

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Add code
Jul 23, 2025
Viaarxiv icon

Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment

Add code
Jun 24, 2025
Viaarxiv icon

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy

Add code
May 28, 2025
Viaarxiv icon

UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models

Add code
May 26, 2025
Viaarxiv icon

Sequence-level Large Language Model Training with Contrastive Preference Optimization

Add code
Feb 23, 2025
Figure 1 for Sequence-level Large Language Model Training with Contrastive Preference Optimization
Figure 2 for Sequence-level Large Language Model Training with Contrastive Preference Optimization
Figure 3 for Sequence-level Large Language Model Training with Contrastive Preference Optimization
Figure 4 for Sequence-level Large Language Model Training with Contrastive Preference Optimization
Viaarxiv icon

False Discovery Rate Control via Frequentist-assisted Horseshoe

Add code
Feb 08, 2025
Viaarxiv icon