Picture for Zhenbo Luo

Zhenbo Luo

Xiaomi MiMo-VL-Miloco Technical Report

Add code
Dec 22, 2025
Viaarxiv icon

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Add code
Nov 17, 2025
Viaarxiv icon

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

Add code
Oct 31, 2025
Viaarxiv icon

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition

Add code
Sep 19, 2025
Viaarxiv icon

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Add code
Sep 19, 2025
Viaarxiv icon

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios

Add code
Aug 27, 2025
Figure 1 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 2 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 3 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 4 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Viaarxiv icon

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Add code
Aug 07, 2025
Viaarxiv icon

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Add code
May 22, 2025
Figure 1 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 2 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 3 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 4 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Viaarxiv icon

VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

Add code
May 15, 2020
Figure 1 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Figure 2 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Figure 3 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Figure 4 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Viaarxiv icon

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Add code
May 15, 2019
Figure 1 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation
Figure 2 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation
Figure 3 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation
Figure 4 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation
Viaarxiv icon