Picture for Zeyu Wang

Zeyu Wang

From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement Learning

Add code
Feb 03, 2026
Viaarxiv icon

Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Add code
Jan 29, 2026
Viaarxiv icon

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

ComfySearch: Autonomous Exploration and Reasoning for ComfyUI Workflows

Add code
Jan 07, 2026
Viaarxiv icon

IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition

Add code
Dec 31, 2025
Viaarxiv icon

Breaking the Passive Learning Trap: An Active Perception Strategy for Human Motion Prediction

Add code
Nov 18, 2025
Viaarxiv icon

Hi-Reco: High-Fidelity Real-Time Conversational Digital Humans

Add code
Nov 16, 2025
Viaarxiv icon

EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

Add code
Nov 14, 2025
Figure 1 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 2 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 3 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Figure 4 for EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm

Add code
Sep 26, 2025
Viaarxiv icon