Picture for Hangbo Bao

Hangbo Bao

Multimodal Latent Language Modeling with Next-Token Diffusion

Add code
Dec 11, 2024
Viaarxiv icon

A Unified View of Masked Image Modeling

Add code
Oct 19, 2022
Figure 1 for A Unified View of Masked Image Modeling
Figure 2 for A Unified View of Masked Image Modeling
Figure 3 for A Unified View of Masked Image Modeling
Figure 4 for A Unified View of Masked Image Modeling
Viaarxiv icon

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Add code
Aug 31, 2022
Figure 1 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 2 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 3 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 4 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Viaarxiv icon

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Add code
Aug 12, 2022
Figure 1 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 2 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 3 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Figure 4 for BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Viaarxiv icon

VL-BEiT: Generative Vision-Language Pretraining

Add code
Jun 02, 2022
Figure 1 for VL-BEiT: Generative Vision-Language Pretraining
Figure 2 for VL-BEiT: Generative Vision-Language Pretraining
Figure 3 for VL-BEiT: Generative Vision-Language Pretraining
Figure 4 for VL-BEiT: Generative Vision-Language Pretraining
Viaarxiv icon

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

Add code
Jun 02, 2022
Viaarxiv icon

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

Add code
Feb 07, 2022
Viaarxiv icon

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

Add code
Nov 03, 2021
Figure 1 for VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Figure 2 for VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Figure 3 for VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Figure 4 for VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Viaarxiv icon

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning

Add code
Oct 26, 2021
Figure 1 for s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning
Figure 2 for s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning
Figure 3 for s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning
Figure 4 for s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning
Viaarxiv icon

Learning to Sample Replacements for ELECTRA Pre-Training

Add code
Jun 25, 2021
Figure 1 for Learning to Sample Replacements for ELECTRA Pre-Training
Figure 2 for Learning to Sample Replacements for ELECTRA Pre-Training
Figure 3 for Learning to Sample Replacements for ELECTRA Pre-Training
Figure 4 for Learning to Sample Replacements for ELECTRA Pre-Training
Viaarxiv icon