Picture for Zhuofan Zong

Zhuofan Zong

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Figure 1 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 2 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 3 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 4 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Viaarxiv icon

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Add code
May 29, 2023
Figure 1 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 2 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 3 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 4 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Viaarxiv icon

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Add code
Apr 03, 2023
Viaarxiv icon

DETRs with Collaborative Hybrid Assignments Training

Add code
Nov 22, 2022
Viaarxiv icon

Self-slimmed Vision Transformer

Add code
Nov 24, 2021
Figure 1 for Self-slimmed Vision Transformer
Figure 2 for Self-slimmed Vision Transformer
Figure 3 for Self-slimmed Vision Transformer
Figure 4 for Self-slimmed Vision Transformer
Viaarxiv icon

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

Add code
Oct 23, 2021
Figure 1 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 2 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 3 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 4 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Viaarxiv icon