Picture for Mingfei Han

Mingfei Han

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Add code
Dec 11, 2024
Viaarxiv icon

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Add code
Dec 06, 2024
Viaarxiv icon

MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

Add code
Nov 26, 2024
Figure 1 for MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Figure 2 for MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Figure 3 for MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Figure 4 for MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Viaarxiv icon

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

Add code
Nov 07, 2024
Figure 1 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 2 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 3 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 4 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Viaarxiv icon

LongVLM: Efficient Long Video Understanding via Large Language Models

Add code
Apr 10, 2024
Viaarxiv icon

Video Recognition in Portrait Mode

Add code
Dec 21, 2023
Viaarxiv icon

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Add code
Dec 19, 2023
Viaarxiv icon

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

Add code
Dec 04, 2023
Viaarxiv icon

Mask Propagation for Efficient Video Semantic Segmentation

Add code
Oct 29, 2023
Figure 1 for Mask Propagation for Efficient Video Semantic Segmentation
Figure 2 for Mask Propagation for Efficient Video Semantic Segmentation
Figure 3 for Mask Propagation for Efficient Video Semantic Segmentation
Figure 4 for Mask Propagation for Efficient Video Semantic Segmentation
Viaarxiv icon

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Add code
Jul 21, 2022
Figure 1 for An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Figure 2 for An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Figure 3 for An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Figure 4 for An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Viaarxiv icon