Picture for Kun He

Kun He

On the Efficiency of Sinkhorn-Knopp for Entropically Regularized Optimal Transport

Add code
Apr 04, 2026
Viaarxiv icon

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

Add code
Mar 30, 2026
Viaarxiv icon

Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves

Add code
Mar 21, 2026
Viaarxiv icon

ITO: Images and Texts as One via Synergizing Multiple Alignment and Training-Time Fusion

Add code
Mar 04, 2026
Viaarxiv icon

Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Add code
Mar 04, 2026
Viaarxiv icon

iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding

Add code
Mar 03, 2026
Viaarxiv icon

KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing

Add code
Feb 04, 2026
Viaarxiv icon

Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents

Add code
Oct 09, 2025
Figure 1 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 2 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 3 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 4 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Viaarxiv icon

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Add code
Aug 17, 2025
Viaarxiv icon

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon