Picture for Zhenjiang Jin

Zhenjiang Jin

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Add code
Apr 29, 2024
Viaarxiv icon

InternLM2 Technical Report

Add code
Mar 26, 2024
Viaarxiv icon

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

Add code
Mar 12, 2024
Figure 1 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Figure 2 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Figure 3 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Figure 4 for WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset
Viaarxiv icon

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

Add code
Aug 22, 2023
Viaarxiv icon