Picture for Ran He

Ran He

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Add code
Apr 15, 2025
Viaarxiv icon

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?

Add code
Apr 14, 2025
Viaarxiv icon

ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation

Add code
Feb 12, 2025
Viaarxiv icon

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Add code
Feb 07, 2025
Viaarxiv icon

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck

Add code
Jan 26, 2025
Figure 1 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 2 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 3 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 4 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Sample Correlation for Fingerprinting Deep Face Recognition

Add code
Dec 30, 2024
Viaarxiv icon

Towards Compatible Fine-tuning for Vision-Language Model Updates

Add code
Dec 30, 2024
Viaarxiv icon

Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation

Add code
Dec 30, 2024
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Figure 1 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 2 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 3 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 4 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Viaarxiv icon