Picture for Liqiang Nie

Liqiang Nie

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

Add code
Jan 13, 2025
Viaarxiv icon

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Add code
Dec 29, 2024
Viaarxiv icon

Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM

Add code
Dec 20, 2024
Viaarxiv icon

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Add code
Dec 08, 2024
Figure 1 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 2 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 3 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 4 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Viaarxiv icon

The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Add code
Nov 13, 2024
Viaarxiv icon

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Add code
Oct 19, 2024
Figure 1 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 2 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 3 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 4 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Viaarxiv icon

Preview-based Category Contrastive Learning for Knowledge Distillation

Add code
Oct 18, 2024
Viaarxiv icon

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training

Add code
Oct 18, 2024
Figure 1 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 2 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 3 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Figure 4 for RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Viaarxiv icon

Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing

Add code
Oct 14, 2024
Figure 1 for Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
Figure 2 for Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
Figure 3 for Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
Figure 4 for Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
Viaarxiv icon

BadCM: Invisible Backdoor Attack Against Cross-Modal Learning

Add code
Oct 03, 2024
Viaarxiv icon