Picture for Dongdong Chen

Dongdong Chen

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

Add code
Mar 20, 2025
Viaarxiv icon

I2V3D: Controllable image-to-video generation with 3D guidance

Add code
Mar 12, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon

On the Vulnerability of Concept Erasure in Diffusion Models

Add code
Feb 24, 2025
Viaarxiv icon

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Add code
Jan 14, 2025
Figure 1 for SmartEraser: Remove Anything from Images using Masked-Region Guidance
Figure 2 for SmartEraser: Remove Anything from Images using Masked-Region Guidance
Figure 3 for SmartEraser: Remove Anything from Images using Masked-Region Guidance
Figure 4 for SmartEraser: Remove Anything from Images using Masked-Region Guidance
Viaarxiv icon

Benchmarking Large and Small MLLMs

Add code
Jan 04, 2025
Figure 1 for Benchmarking Large and Small MLLMs
Figure 2 for Benchmarking Large and Small MLLMs
Figure 3 for Benchmarking Large and Small MLLMs
Figure 4 for Benchmarking Large and Small MLLMs
Viaarxiv icon

Olympus: A Universal Task Router for Computer Vision Tasks

Add code
Dec 12, 2024
Viaarxiv icon

MageBench: Bridging Large Multimodal Models to Agents

Add code
Dec 05, 2024
Viaarxiv icon

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation

Add code
Nov 27, 2024
Figure 1 for Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Figure 2 for Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Figure 3 for Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Figure 4 for Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Viaarxiv icon

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Add code
Nov 26, 2024
Figure 1 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 2 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 3 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 4 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Viaarxiv icon