Picture for Yu-Gang Jiang

Yu-Gang Jiang

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Add code
Dec 05, 2024
Figure 1 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 2 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 3 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 4 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Viaarxiv icon

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Add code
Dec 04, 2024
Figure 1 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 2 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 3 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 4 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Viaarxiv icon

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

Add code
Dec 03, 2024
Viaarxiv icon

DiffPatch: Generating Customizable Adversarial Patches using Diffusion Model

Add code
Dec 02, 2024
Viaarxiv icon

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Add code
Nov 29, 2024
Figure 1 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 2 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 3 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 4 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Viaarxiv icon

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

Add code
Nov 28, 2024
Viaarxiv icon

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Add code
Nov 25, 2024
Figure 1 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 2 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 3 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 4 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Viaarxiv icon

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

Add code
Nov 24, 2024
Viaarxiv icon

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Viaarxiv icon

Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning

Add code
Nov 19, 2024
Viaarxiv icon