Picture for Hongsheng Li

Hongsheng Li

Empowering LLMs in Decision Games through Algorithmic Data Synthesis

Add code
Mar 18, 2025
Viaarxiv icon

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Add code
Mar 14, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Add code
Mar 13, 2025
Viaarxiv icon

DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

Add code
Mar 10, 2025
Viaarxiv icon

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices

Add code
Mar 08, 2025
Viaarxiv icon

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

Add code
Mar 08, 2025
Viaarxiv icon

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering

Add code
Feb 28, 2025
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon