Picture for Pi Bu

Pi Bu

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

Add code
Mar 12, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating

Add code
Dec 24, 2024
Viaarxiv icon

Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation

Add code
Dec 19, 2024
Viaarxiv icon