Picture for Zuxuan Wu

Zuxuan Wu

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Add code
Dec 05, 2024
Figure 1 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 2 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 3 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Figure 4 for CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Viaarxiv icon

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Add code
Dec 04, 2024
Figure 1 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 2 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 3 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Figure 4 for Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
Viaarxiv icon

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Add code
Nov 29, 2024
Figure 1 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 2 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 3 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Figure 4 for ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Viaarxiv icon

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Add code
Nov 26, 2024
Viaarxiv icon

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Add code
Nov 25, 2024
Figure 1 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 2 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 3 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 4 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Viaarxiv icon

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Viaarxiv icon

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Add code
Oct 27, 2024
Viaarxiv icon

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

Add code
Sep 11, 2024
Figure 1 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Figure 2 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Figure 3 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Figure 4 for DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Viaarxiv icon

GenRec: Unifying Video Generation and Recognition with Diffusion Models

Add code
Aug 27, 2024
Figure 1 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 2 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 3 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 4 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Viaarxiv icon

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Add code
Aug 03, 2024
Figure 1 for Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
Figure 2 for Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
Figure 3 for Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
Figure 4 for Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers
Viaarxiv icon