Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Add code
Oct 23, 2024
Viaarxiv icon

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Add code
Oct 21, 2024
Viaarxiv icon

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark

Add code
Oct 15, 2024
Viaarxiv icon

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

Add code
Oct 10, 2024
Viaarxiv icon

VIRT: Vision Instructed Transformer for Robotic Manipulation

Add code
Oct 09, 2024
Viaarxiv icon

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Add code
Oct 08, 2024
Viaarxiv icon

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Add code
Sep 01, 2024
Figure 1 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 2 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 3 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 4 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Viaarxiv icon

Attention-Guided Perturbation for Unsupervised Image Anomaly Detection

Add code
Aug 14, 2024
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping

Add code
Aug 04, 2024
Viaarxiv icon