Picture for Rongrong Ji

Rongrong Ji

Xiamen University, Peng Cheng Laboratory

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

SVFR: A Unified Framework for Generalized Video Face Restoration

Add code
Jan 03, 2025
Figure 1 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 2 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 3 for SVFR: A Unified Framework for Generalized Video Face Restoration
Figure 4 for SVFR: A Unified Framework for Generalized Video Face Restoration
Viaarxiv icon

Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers

Add code
Dec 21, 2024
Viaarxiv icon

DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On

Add code
Dec 19, 2024
Viaarxiv icon

Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search

Add code
Dec 19, 2024
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Viaarxiv icon

AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation

Add code
Dec 03, 2024
Viaarxiv icon

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Add code
Nov 29, 2024
Figure 1 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 2 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 3 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Figure 4 for Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Viaarxiv icon

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

Add code
Nov 25, 2024
Viaarxiv icon