Picture for Caixin Kang

Caixin Kang

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Add code
Feb 13, 2026
Viaarxiv icon

Towards Interactive Intelligence for Digital Humans

Add code
Dec 15, 2025
Viaarxiv icon

Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels

Add code
Dec 08, 2025
Figure 1 for Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels
Figure 2 for Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels
Figure 3 for Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels
Figure 4 for Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels
Viaarxiv icon

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions

Add code
Oct 31, 2025
Viaarxiv icon

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Add code
Aug 24, 2025
Figure 1 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 2 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 3 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 4 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Viaarxiv icon

Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization

Add code
Apr 19, 2025
Figure 1 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 2 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 3 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 4 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Viaarxiv icon

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Add code
Jan 09, 2025
Figure 1 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 2 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 3 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 4 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Viaarxiv icon

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

Add code
Dec 04, 2024
Figure 1 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 2 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 3 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 4 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Viaarxiv icon

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

Add code
Dec 03, 2024
Figure 1 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 2 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 3 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 4 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Viaarxiv icon

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Add code
Sep 14, 2024
Viaarxiv icon