Picture for Chengyue Wu

Chengyue Wu

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Add code
Dec 13, 2024
Figure 1 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 2 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 3 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Figure 4 for DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Viaarxiv icon

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Add code
Nov 12, 2024
Viaarxiv icon

Autoregressive Models in Vision: A Survey

Add code
Nov 08, 2024
Figure 1 for Autoregressive Models in Vision: A Survey
Figure 2 for Autoregressive Models in Vision: A Survey
Figure 3 for Autoregressive Models in Vision: A Survey
Figure 4 for Autoregressive Models in Vision: A Survey
Viaarxiv icon

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Add code
Oct 17, 2024
Figure 1 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 2 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 3 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Figure 4 for Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Viaarxiv icon

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Add code
May 13, 2024
Figure 1 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 2 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 3 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 4 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Viaarxiv icon

Adapting LLaMA Decoder to Vision Transformer

Add code
Apr 13, 2024
Viaarxiv icon

FiT: Flexible Vision Transformer for Diffusion Model

Add code
Feb 19, 2024
Figure 1 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 2 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 3 for FiT: Flexible Vision Transformer for Diffusion Model
Figure 4 for FiT: Flexible Vision Transformer for Diffusion Model
Viaarxiv icon

LLaMA Pro: Progressive LLaMA with Block Expansion

Add code
Jan 04, 2024
Figure 1 for LLaMA Pro: Progressive LLaMA with Block Expansion
Figure 2 for LLaMA Pro: Progressive LLaMA with Block Expansion
Figure 3 for LLaMA Pro: Progressive LLaMA with Block Expansion
Figure 4 for LLaMA Pro: Progressive LLaMA with Block Expansion
Viaarxiv icon

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Add code
Apr 28, 2023
Viaarxiv icon

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

Add code
Apr 25, 2023
Figure 1 for Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
Figure 2 for Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
Figure 3 for Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
Figure 4 for Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
Viaarxiv icon