Picture for Lewei Yao

Lewei Yao

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Add code
Apr 14, 2024
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Viaarxiv icon

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Add code
Nov 11, 2023
Viaarxiv icon

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Add code
Oct 16, 2023
Figure 1 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 2 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 3 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 4 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Viaarxiv icon

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

Add code
Jul 04, 2023
Figure 1 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 2 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 3 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 4 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Viaarxiv icon

DetGPT: Detect What You Need via Reasoning

Add code
May 24, 2023
Figure 1 for DetGPT: Detect What You Need via Reasoning
Figure 2 for DetGPT: Detect What You Need via Reasoning
Figure 3 for DetGPT: Detect What You Need via Reasoning
Figure 4 for DetGPT: Detect What You Need via Reasoning
Viaarxiv icon

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Add code
May 04, 2023
Viaarxiv icon

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

Add code
Apr 10, 2023
Figure 1 for DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Figure 2 for DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Figure 3 for DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Figure 4 for DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Viaarxiv icon

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Add code
Sep 20, 2022
Figure 1 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 2 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 3 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 4 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Viaarxiv icon

Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework

Add code
Mar 10, 2022
Figure 1 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 2 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 3 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 4 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Viaarxiv icon