Picture for Teng Wang

Teng Wang

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Add code
Feb 11, 2026
Viaarxiv icon

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Add code
Feb 09, 2026
Viaarxiv icon

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation

Add code
Jan 26, 2026
Viaarxiv icon

ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge

Add code
Dec 23, 2025
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

Add code
Nov 18, 2025
Figure 1 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 2 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 3 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 4 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Viaarxiv icon

FLYINGTRUST: A Benchmark for Quadrotor Navigation Across Scenarios and Vehicles

Add code
Oct 30, 2025
Viaarxiv icon

UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography

Add code
Sep 17, 2025
Viaarxiv icon

Predicting person-level injury severity using crash narratives: A balanced approach with roadway classification and natural language process techniques

Add code
Sep 09, 2025
Viaarxiv icon

CVBench: Evaluating Cross-Video Synergies for Complex Multimodal Understanding and Reasoning

Add code
Aug 28, 2025
Viaarxiv icon