Picture for Ming Li

Ming Li

School of Integrated Circuits, Peking University

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

Add code
Mar 10, 2026
Viaarxiv icon

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Add code
Mar 09, 2026
Viaarxiv icon

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

D-GVIO: A Buffer-Driven and Efficient Decentralized GNSS-Visual-Inertial State Estimator for Multi-Agent Systems

Add code
Mar 03, 2026
Viaarxiv icon

Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition

Add code
Feb 24, 2026
Viaarxiv icon

PyVision-RL: Forging Open Agentic Vision Models via RL

Add code
Feb 24, 2026
Viaarxiv icon

DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling

Add code
Feb 18, 2026
Viaarxiv icon

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Add code
Feb 15, 2026
Viaarxiv icon

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

Add code
Feb 12, 2026
Viaarxiv icon

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Add code
Feb 11, 2026
Viaarxiv icon