Picture for Wenxuan Huang

Wenxuan Huang

CLIP-Map: Structured Matrix Mapping for Parameter-Efficient CLIP Compression

Add code
Feb 05, 2026
Viaarxiv icon

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Add code
Jan 08, 2026
Viaarxiv icon

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Add code
Dec 18, 2025
Viaarxiv icon

Interleaving Reasoning for Better Text-to-Image Generation

Add code
Sep 09, 2025
Figure 1 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 2 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 3 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 4 for Interleaving Reasoning for Better Text-to-Image Generation
Viaarxiv icon

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Add code
Jul 28, 2025
Figure 1 for Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Figure 2 for Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Figure 3 for Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Figure 4 for Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Viaarxiv icon

AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need

Add code
Jun 18, 2025
Viaarxiv icon

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Add code
Jun 12, 2025
Viaarxiv icon

MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

Add code
May 26, 2025
Figure 1 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 2 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 3 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Figure 4 for MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning
Viaarxiv icon