Picture for Weihan Wang

Weihan Wang

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Viaarxiv icon

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Add code
Jan 06, 2025
Viaarxiv icon

CogVLM2: Visual Language Models for Image and Video Understanding

Add code
Aug 29, 2024
Figure 1 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 2 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 3 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 4 for CogVLM2: Visual Language Models for Image and Video Understanding
Viaarxiv icon

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Add code
Aug 12, 2024
Viaarxiv icon

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

Add code
Jul 31, 2024
Viaarxiv icon

LVBench: An Extreme Long Video Understanding Benchmark

Add code
Jun 12, 2024
Figure 1 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 2 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 3 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 4 for LVBench: An Extreme Long Video Understanding Benchmark
Viaarxiv icon

Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

Add code
May 09, 2024
Viaarxiv icon

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Add code
Mar 12, 2024
Figure 1 for Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints
Figure 2 for Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints
Figure 3 for Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints
Figure 4 for Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints
Viaarxiv icon

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Add code
Mar 08, 2024
Viaarxiv icon

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

Add code
Feb 06, 2024
Figure 1 for CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Figure 2 for CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Figure 3 for CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Figure 4 for CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Viaarxiv icon