Picture for Lin Ma

Lin Ma

InstructionBench: An Instructional Video Understanding Benchmark

Add code
Apr 07, 2025
Viaarxiv icon

UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding

Add code
Apr 06, 2025
Viaarxiv icon

UniViTAR: Unified Vision Transformer with Native Resolution

Add code
Apr 02, 2025
Viaarxiv icon

AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline

Add code
Apr 01, 2025
Viaarxiv icon

DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data

Add code
Mar 25, 2025
Viaarxiv icon

Variational Bayesian Personalized Ranking

Add code
Mar 14, 2025
Viaarxiv icon

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Add code
Feb 27, 2025
Viaarxiv icon

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding

Add code
Jan 03, 2025
Viaarxiv icon

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

Add code
Dec 27, 2024
Viaarxiv icon

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Add code
Dec 10, 2024
Figure 1 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 2 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 3 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 4 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Viaarxiv icon