Picture for Pengxiang Li

Pengxiang Li

Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds

Add code
Oct 31, 2025
Figure 1 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 2 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 3 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 4 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Viaarxiv icon

Diffusion Language Models Know the Answer Before Decoding

Add code
Aug 27, 2025
Figure 1 for Diffusion Language Models Know the Answer Before Decoding
Figure 2 for Diffusion Language Models Know the Answer Before Decoding
Figure 3 for Diffusion Language Models Know the Answer Before Decoding
Figure 4 for Diffusion Language Models Know the Answer Before Decoding
Viaarxiv icon

DriveAgent-R1: Advancing VLM-based Autonomous Driving with Hybrid Thinking and Active Perception

Add code
Jul 28, 2025
Viaarxiv icon

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Add code
Jun 05, 2025
Viaarxiv icon

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Add code
May 26, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Figure 1 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 2 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 3 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 4 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Viaarxiv icon

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

Add code
May 14, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Add code
Apr 19, 2025
Viaarxiv icon