Picture for Long Qian

Long Qian

Bootstrapped Model Predictive Control

Add code
Mar 24, 2025
Viaarxiv icon

Friend or Foe? Harnessing Controllable Overfitting for Anomaly Detection

Add code
Nov 30, 2024
Viaarxiv icon

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Add code
Nov 29, 2024
Viaarxiv icon

Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation

Add code
Nov 18, 2024
Viaarxiv icon

Grounded Answers for Multi-agent Decision-making Problem through Generative World Model

Add code
Oct 03, 2024
Viaarxiv icon

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Add code
Feb 18, 2024
Figure 1 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 2 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 3 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 4 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Viaarxiv icon

Navigate Biopsy with Ultrasound under Augmented Reality Device: Towards Higher System Performance

Add code
Feb 04, 2024
Viaarxiv icon

EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

Add code
Jul 03, 2023
Viaarxiv icon

Fine-Grained Semantically Aligned Vision-Language Pre-Training

Add code
Aug 04, 2022
Figure 1 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 2 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 3 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 4 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Viaarxiv icon

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Add code
Aug 03, 2022
Figure 1 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 2 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 3 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 4 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Viaarxiv icon