Picture for Yupan Huang

Yupan Huang

RedStone: Curating General, Code, Math, and QA Data for Large Language Models

Add code
Dec 04, 2024
Viaarxiv icon

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Add code
Nov 28, 2023
Figure 1 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 2 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 3 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 4 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Viaarxiv icon

Kosmos-2.5: A Multimodal Literate Model

Add code
Sep 20, 2023
Viaarxiv icon

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Add code
Aug 31, 2023
Viaarxiv icon

TextDiffuser: Diffusion Models as Text Painters

Add code
May 24, 2023
Viaarxiv icon

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Add code
Apr 19, 2022
Figure 1 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 2 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 3 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 4 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Viaarxiv icon

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

Add code
Oct 19, 2021
Figure 1 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Figure 2 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Figure 3 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Viaarxiv icon

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Add code
Oct 19, 2021
Figure 1 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 2 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 3 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 4 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Viaarxiv icon

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Add code
Jun 28, 2021
Figure 1 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 2 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 3 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 4 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Viaarxiv icon

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Add code
Apr 08, 2021
Figure 1 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 2 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 3 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 4 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Viaarxiv icon