Picture for Wenwei Zhang

Wenwei Zhang

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

Add code
Mar 31, 2025
Viaarxiv icon

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Add code
Mar 27, 2025
Viaarxiv icon

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining

Add code
Mar 25, 2025
Viaarxiv icon

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

Add code
Mar 04, 2025
Viaarxiv icon

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Add code
Feb 10, 2025
Viaarxiv icon

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Add code
Jan 21, 2025
Figure 1 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 2 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 3 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 4 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Viaarxiv icon

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Add code
Jan 07, 2025
Figure 1 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 2 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 3 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 4 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Viaarxiv icon

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Add code
Jan 07, 2025
Viaarxiv icon

Are Your LLMs Capable of Stable Reasoning?

Add code
Dec 17, 2024
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon