Picture for Botian Shi

Botian Shi

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

Add code
Nov 08, 2024
Viaarxiv icon

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Add code
Oct 13, 2024
Viaarxiv icon

MinerU: An Open-Source Solution for Precise Document Content Extraction

Add code
Sep 27, 2024
Figure 1 for MinerU: An Open-Source Solution for Precise Document Content Extraction
Figure 2 for MinerU: An Open-Source Solution for Precise Document Content Extraction
Figure 3 for MinerU: An Open-Source Solution for Precise Document Content Extraction
Figure 4 for MinerU: An Open-Source Solution for Precise Document Content Extraction
Viaarxiv icon

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Add code
Sep 06, 2024
Figure 1 for DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Figure 2 for DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Figure 3 for DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Figure 4 for DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Viaarxiv icon

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Add code
Aug 01, 2024
Viaarxiv icon

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

Add code
Jul 23, 2024
Viaarxiv icon

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

Add code
May 24, 2024
Viaarxiv icon