Picture for Minghan Li

Minghan Li

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Add code
Mar 18, 2025
Viaarxiv icon

FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Add code
Mar 17, 2025
Viaarxiv icon

Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models

Add code
Jan 28, 2025
Viaarxiv icon

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Add code
Dec 09, 2024
Viaarxiv icon

KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models

Add code
Nov 09, 2024
Figure 1 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 2 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 3 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 4 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Viaarxiv icon

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Add code
Oct 19, 2024
Viaarxiv icon

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Add code
Aug 21, 2024
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Figure 1 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 2 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 3 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 4 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Viaarxiv icon

Unifying Multimodal Retrieval via Document Screenshot Embedding

Add code
Jun 17, 2024
Viaarxiv icon

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

Add code
May 29, 2024
Figure 1 for Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Figure 2 for Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Figure 3 for Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Figure 4 for Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Viaarxiv icon