Picture for Jiajun Song

Jiajun Song

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Add code
Jun 15, 2026
Viaarxiv icon

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Add code
Apr 26, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Add code
Mar 10, 2026
Viaarxiv icon

A Unified Representation Underlying the Judgment of Large Language Models

Add code
Oct 31, 2025
Viaarxiv icon

VARMA-Enhanced Transformer for Time Series Forecasting

Add code
Sep 05, 2025
Figure 1 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 2 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 3 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 4 for VARMA-Enhanced Transformer for Time Series Forecasting
Viaarxiv icon

SalientFusion: Context-Aware Compositional Zero-Shot Food Recognition

Add code
Sep 04, 2025
Viaarxiv icon

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

Add code
Aug 01, 2025
Viaarxiv icon

ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs

Add code
Apr 02, 2025
Viaarxiv icon