Picture for Gen Li

Gen Li

GIVE: Grounding Human Gestures in Vision-Language-Action Models

Add code
Jun 11, 2026
Viaarxiv icon

PACT: Learning Diverse Diagnostic Strategies via Privileged Synthesis and Branch Consensus

Add code
Jun 08, 2026
Viaarxiv icon

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

Add code
Jun 03, 2026
Viaarxiv icon

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning

Add code
May 28, 2026
Viaarxiv icon

Gaze2Act: Gaze-Conditioned Vision-Language-Action Policies for Interactive Robot Manipulation

Add code
May 28, 2026
Viaarxiv icon

MARS Policy: Multimodality Only When It Matters

Add code
May 28, 2026
Viaarxiv icon

C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning

Add code
May 27, 2026
Viaarxiv icon

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

Add code
May 27, 2026
Viaarxiv icon

Mags-RL: Wearing Multimodal LLMs a Magnifying Glass via Agentic Reinforcement Learning For Complex Scene Reasoning

Add code
May 27, 2026
Viaarxiv icon

Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback

Add code
May 25, 2026
Viaarxiv icon