Picture for Songze Li

Songze Li

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Add code
Jun 10, 2026
Viaarxiv icon

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Add code
Jun 04, 2026
Viaarxiv icon

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Add code
May 29, 2026
Viaarxiv icon

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Add code
May 19, 2026
Viaarxiv icon

ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

Add code
Apr 10, 2026
Viaarxiv icon

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

Add code
Apr 08, 2026
Viaarxiv icon

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Add code
Mar 29, 2026
Viaarxiv icon

Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing

Add code
Jan 30, 2026
Viaarxiv icon

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

Add code
Jan 29, 2026
Viaarxiv icon

Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning

Add code
Jan 26, 2026
Viaarxiv icon