Picture for Kaixun Jiang

Kaixun Jiang

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations

Add code
Dec 30, 2025
Viaarxiv icon

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Add code
Dec 15, 2025
Viaarxiv icon

Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

Add code
Nov 15, 2025
Viaarxiv icon

Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection

Add code
Nov 14, 2025
Viaarxiv icon

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

Add code
Jun 17, 2025
Viaarxiv icon

VideoPure: Diffusion-based Adversarial Purification for Video Recognition

Add code
Jan 25, 2025
Figure 1 for VideoPure: Diffusion-based Adversarial Purification for Video Recognition
Figure 2 for VideoPure: Diffusion-based Adversarial Purification for Video Recognition
Figure 3 for VideoPure: Diffusion-based Adversarial Purification for Video Recognition
Figure 4 for VideoPure: Diffusion-based Adversarial Purification for Video Recognition
Viaarxiv icon

DeTrack: In-model Latent Denoising Learning for Visual Object Tracking

Add code
Jan 05, 2025
Figure 1 for DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Figure 2 for DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Figure 3 for DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Figure 4 for DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Viaarxiv icon

X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation

Add code
Sep 28, 2024
Figure 1 for X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
Figure 2 for X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
Figure 3 for X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
Figure 4 for X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
Viaarxiv icon

General Compression Framework for Efficient Transformer Object Tracking

Add code
Sep 26, 2024
Figure 1 for General Compression Framework for Efficient Transformer Object Tracking
Figure 2 for General Compression Framework for Efficient Transformer Object Tracking
Figure 3 for General Compression Framework for Efficient Transformer Object Tracking
Figure 4 for General Compression Framework for Efficient Transformer Object Tracking
Viaarxiv icon