Picture for Xiongkuo Min

Xiongkuo Min

Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation

Add code
Jan 13, 2026
Viaarxiv icon

KidVis: Do Multimodal Large Language Models Possess the Visual Perceptual Capabilities of a 6-Year-Old?

Add code
Jan 13, 2026
Viaarxiv icon

VTONQA: A Multi-Dimensional Quality Assessment Dataset for Virtual Try-on

Add code
Jan 06, 2026
Viaarxiv icon

Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs

Add code
Dec 19, 2025
Figure 1 for Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
Figure 2 for Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
Figure 3 for Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
Figure 4 for Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
Viaarxiv icon

ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

Add code
Nov 18, 2025
Figure 1 for ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
Figure 2 for ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
Figure 3 for ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
Figure 4 for ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
Viaarxiv icon

VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results

Add code
Sep 11, 2025
Figure 1 for VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results
Figure 2 for VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results
Figure 3 for VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results
Figure 4 for VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results
Viaarxiv icon

Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning

Add code
Aug 06, 2025
Figure 1 for Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
Figure 2 for Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
Figure 3 for Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
Figure 4 for Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
Viaarxiv icon

CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos

Add code
Jul 16, 2025
Viaarxiv icon

Scaling-up Perceptual Video Quality Assessment

Add code
May 28, 2025
Viaarxiv icon

TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs

Add code
May 26, 2025
Viaarxiv icon