Picture for Xiaoshuai Sun

Xiaoshuai Sun

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

Add code
Dec 10, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Viaarxiv icon

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Add code
Nov 29, 2024
Figure 1 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 2 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 3 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 4 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Viaarxiv icon

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

Add code
Nov 25, 2024
Viaarxiv icon

Any-to-3D Generation via Hybrid Diffusion Supervision

Add code
Nov 22, 2024
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Add code
Oct 06, 2024
Figure 1 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 2 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 3 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 4 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Viaarxiv icon

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Add code
Aug 26, 2024
Viaarxiv icon

TraDiffusion: Trajectory-Based Training-Free Image Generation

Add code
Aug 19, 2024
Figure 1 for TraDiffusion: Trajectory-Based Training-Free Image Generation
Figure 2 for TraDiffusion: Trajectory-Based Training-Free Image Generation
Figure 3 for TraDiffusion: Trajectory-Based Training-Free Image Generation
Figure 4 for TraDiffusion: Trajectory-Based Training-Free Image Generation
Viaarxiv icon