Picture for Xiaosong Zhang

Xiaosong Zhang

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Viaarxiv icon

Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

Add code
Aug 16, 2024
Viaarxiv icon

Do As I Do: Pose Guided Human Motion Copy

Add code
Jun 24, 2024
Viaarxiv icon

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Add code
Feb 06, 2024
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon

CapsFusion: Rethinking Image-Text Data at Scale

Add code
Nov 02, 2023
Viaarxiv icon

Generative Pretraining in Multimodality

Add code
Jul 11, 2023
Viaarxiv icon

SegGPT: Segmenting Everything In Context

Add code
Apr 06, 2023
Viaarxiv icon

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

Add code
May 30, 2022
Figure 1 for HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Figure 2 for HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Figure 3 for HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Figure 4 for HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Viaarxiv icon

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Add code
May 19, 2022
Figure 1 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 2 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 3 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 4 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Viaarxiv icon