Picture for Xiangchao Yan

Xiangchao Yan

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Add code
Jan 07, 2025
Viaarxiv icon

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Add code
Dec 16, 2024
Viaarxiv icon

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Add code
Oct 13, 2024
Figure 1 for Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Figure 2 for Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Figure 3 for Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Figure 4 for Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
Viaarxiv icon

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Add code
Apr 29, 2024
Figure 1 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 2 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 3 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 4 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Viaarxiv icon

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Add code
Feb 19, 2024
Viaarxiv icon

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

Add code
Sep 25, 2023
Figure 1 for SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Figure 2 for SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Figure 3 for SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Figure 4 for SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Viaarxiv icon

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

Add code
Sep 25, 2023
Figure 1 for ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
Figure 2 for ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
Figure 3 for ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
Figure 4 for ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
Viaarxiv icon