Picture for Xinlei Yu

Xinlei Yu

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

Add code
Feb 05, 2026
Viaarxiv icon

Dual Latent Memory for Visual Multi-agent System

Add code
Jan 31, 2026
Viaarxiv icon

Memory in the Age of AI Agents

Add code
Dec 15, 2025
Viaarxiv icon

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Add code
Nov 19, 2025
Figure 1 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 2 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 3 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 4 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Viaarxiv icon

VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

Add code
Nov 14, 2025
Viaarxiv icon

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing

Add code
Oct 02, 2025
Figure 1 for DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Figure 2 for DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Figure 3 for DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Figure 4 for DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing
Viaarxiv icon

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

Add code
Aug 08, 2025
Viaarxiv icon

Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation

Add code
Apr 23, 2025
Viaarxiv icon

ICH-SCNet: Intracerebral Hemorrhage Segmentation and Prognosis Classification Network Using CLIP-guided SAM mechanism

Add code
Nov 07, 2024
Viaarxiv icon

ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network

Add code
Feb 17, 2024
Figure 1 for ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network
Figure 2 for ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network
Figure 3 for ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network
Figure 4 for ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network
Viaarxiv icon