Picture for Xiaoshuai Sun

Xiaoshuai Sun

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

Add code
Feb 08, 2025
Viaarxiv icon

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Add code
Jan 09, 2025
Viaarxiv icon

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

Add code
Dec 10, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Figure 1 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 2 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 3 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 4 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Viaarxiv icon

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Add code
Nov 29, 2024
Figure 1 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 2 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 3 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 4 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Viaarxiv icon

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

Add code
Nov 25, 2024
Viaarxiv icon

Any-to-3D Generation via Hybrid Diffusion Supervision

Add code
Nov 22, 2024
Figure 1 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 2 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 3 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 4 for Any-to-3D Generation via Hybrid Diffusion Supervision
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Add code
Oct 06, 2024
Figure 1 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 2 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 3 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Figure 4 for DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
Viaarxiv icon