Picture for Hamid Rezatofighi

Hamid Rezatofighi

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Add code
Mar 25, 2025
Viaarxiv icon

Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting

Add code
Feb 20, 2025
Viaarxiv icon

Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering

Add code
Oct 27, 2024
Figure 1 for Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
Figure 2 for Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
Figure 3 for Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
Figure 4 for Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
Viaarxiv icon

TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene

Add code
Sep 26, 2024
Viaarxiv icon

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions

Add code
Sep 16, 2024
Figure 1 for NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Figure 2 for NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Figure 3 for NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Figure 4 for NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
Viaarxiv icon

How Well Can Vision Language Models See Image Details?

Add code
Aug 07, 2024
Viaarxiv icon

DrVideo: Document Retrieval Based Long Video Understanding

Add code
Jun 18, 2024
Figure 1 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 2 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 3 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 4 for DrVideo: Document Retrieval Based Long Video Understanding
Viaarxiv icon

Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning

Add code
Apr 08, 2024
Viaarxiv icon

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

Add code
Apr 06, 2024
Viaarxiv icon