Picture for Zhao Xue

Zhao Xue

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Viaarxiv icon

CogVLM2: Visual Language Models for Image and Video Understanding

Add code
Aug 29, 2024
Figure 1 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 2 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 3 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 4 for CogVLM2: Visual Language Models for Image and Video Understanding
Viaarxiv icon

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

Add code
Mar 30, 2022
Figure 1 for WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Figure 2 for WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Figure 3 for WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Figure 4 for WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Viaarxiv icon