Picture for Zhiliang Peng

Zhiliang Peng

Multimodal Latent Language Modeling with Next-Token Diffusion

Add code
Dec 11, 2024
Viaarxiv icon

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon

Kosmos-2: Grounding Multimodal Large Language Models to the World

Add code
Jul 13, 2023
Viaarxiv icon

Generic-to-Specific Distillation of Masked Autoencoders

Add code
Feb 28, 2023
Viaarxiv icon

A Unified View of Masked Image Modeling

Add code
Oct 19, 2022
Figure 1 for A Unified View of Masked Image Modeling
Figure 2 for A Unified View of Masked Image Modeling
Figure 3 for A Unified View of Masked Image Modeling
Figure 4 for A Unified View of Masked Image Modeling
Viaarxiv icon

Foundation Transformers

Add code
Oct 19, 2022
Figure 1 for Foundation Transformers
Figure 2 for Foundation Transformers
Figure 3 for Foundation Transformers
Figure 4 for Foundation Transformers
Viaarxiv icon

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Add code
Aug 31, 2022
Figure 1 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 2 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 3 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 4 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Viaarxiv icon

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Add code
Aug 12, 2022
Figure 1 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 2 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 3 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 4 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Viaarxiv icon

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Add code
May 19, 2022
Figure 1 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 2 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 3 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 4 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Viaarxiv icon

Long-tailed Distribution Adaptation

Add code
Oct 06, 2021
Figure 1 for Long-tailed Distribution Adaptation
Figure 2 for Long-tailed Distribution Adaptation
Figure 3 for Long-tailed Distribution Adaptation
Figure 4 for Long-tailed Distribution Adaptation
Viaarxiv icon