Picture for Rongrong Ji

Rongrong Ji

Xiamen University, Peng Cheng Laboratory

Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery

Add code
Feb 09, 2025
Viaarxiv icon

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

Add code
Feb 08, 2025
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting

Add code
Jan 30, 2025
Figure 1 for Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting
Figure 2 for Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting
Figure 3 for Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting
Figure 4 for Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting
Viaarxiv icon

SVFR: A Unified Framework for Generalized Video Face Restoration

Add code
Jan 03, 2025
Figure 1 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 2 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 3 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 4 for SVFR: A Unified Framework for Generalized Video Face Restoration
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers

Add code
Dec 21, 2024
Viaarxiv icon

DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On

Add code
Dec 19, 2024
Viaarxiv icon

Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Add code
Dec 19, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon