Picture for Peng Li

Peng Li

DJI Innovations Inc

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Add code
Mar 13, 2025
Viaarxiv icon

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Add code
Mar 13, 2025
Viaarxiv icon

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation

Add code
Mar 11, 2025
Viaarxiv icon

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

Add code
Mar 05, 2025
Viaarxiv icon

Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization

Add code
Feb 20, 2025
Viaarxiv icon

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Add code
Feb 20, 2025
Viaarxiv icon

LLM-USO: Large Language Model-based Universal Sizing Optimizer

Add code
Feb 04, 2025
Viaarxiv icon

Perspective Transition of Large Language Models for Solving Subjective Tasks

Add code
Jan 16, 2025
Viaarxiv icon

Hierarchical Superpixel Segmentation via Structural Information Theory

Add code
Jan 13, 2025
Viaarxiv icon