Picture for Jiajun Song

Jiajun Song

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Add code
Apr 26, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Add code
Mar 10, 2026
Viaarxiv icon

A Unified Representation Underlying the Judgment of Large Language Models

Add code
Oct 31, 2025
Viaarxiv icon

VARMA-Enhanced Transformer for Time Series Forecasting

Add code
Sep 05, 2025
Figure 1 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 2 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 3 for VARMA-Enhanced Transformer for Time Series Forecasting
Figure 4 for VARMA-Enhanced Transformer for Time Series Forecasting
Viaarxiv icon

SalientFusion: Context-Aware Compositional Zero-Shot Food Recognition

Add code
Sep 04, 2025
Viaarxiv icon

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

Add code
Aug 01, 2025
Viaarxiv icon

ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs

Add code
Apr 02, 2025
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon