Picture for Bin Qin

Bin Qin

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Add code
Jan 26, 2026
Viaarxiv icon

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Add code
Jan 15, 2026
Viaarxiv icon

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon

IMSE: Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

Add code
Nov 18, 2025
Viaarxiv icon

DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home

Add code
Nov 18, 2025
Figure 1 for DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home
Figure 2 for DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home
Figure 3 for DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home
Figure 4 for DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home
Viaarxiv icon

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

Add code
Oct 31, 2025
Viaarxiv icon

ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning

Add code
Oct 01, 2025
Figure 1 for ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning
Figure 2 for ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning
Figure 3 for ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning
Figure 4 for ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning
Viaarxiv icon

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Add code
Sep 19, 2025
Viaarxiv icon

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Add code
Aug 07, 2025
Viaarxiv icon

CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning

Add code
May 27, 2025
Viaarxiv icon