Picture for Bin Wen

Bin Wen

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

Add code
Feb 08, 2026
Viaarxiv icon

Joint Reward Modeling: Internalizing Chain-of-Thought for Efficient Visual Reward Models

Add code
Feb 07, 2026
Viaarxiv icon

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

Add code
Feb 07, 2026
Viaarxiv icon

OpenOneRec Technical Report

Add code
Dec 31, 2025
Viaarxiv icon

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Add code
Nov 07, 2025
Viaarxiv icon

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement

Add code
Aug 20, 2025
Viaarxiv icon

Kwai Keye-VL Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation

Add code
Jun 07, 2025
Figure 1 for RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
Figure 2 for RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
Figure 3 for RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
Figure 4 for RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
Viaarxiv icon

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Add code
May 27, 2025
Figure 1 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 2 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 3 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Figure 4 for Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon