Picture for Wenwei Zhang

Wenwei Zhang

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

Add code
Mar 04, 2025
Viaarxiv icon

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Add code
Feb 10, 2025
Viaarxiv icon

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Add code
Jan 21, 2025
Figure 1 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 2 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 3 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Figure 4 for InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Viaarxiv icon

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Add code
Jan 07, 2025
Viaarxiv icon

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Add code
Jan 07, 2025
Figure 1 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 2 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 3 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 4 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Viaarxiv icon

Are Your LLMs Capable of Stable Reasoning?

Add code
Dec 17, 2024
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon

Training Language Models to Critique With Multi-agent Feedback

Add code
Oct 20, 2024
Figure 1 for Training Language Models to Critique With Multi-agent Feedback
Figure 2 for Training Language Models to Critique With Multi-agent Feedback
Figure 3 for Training Language Models to Critique With Multi-agent Feedback
Figure 4 for Training Language Models to Critique With Multi-agent Feedback
Viaarxiv icon

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Add code
Sep 26, 2024
Figure 1 for LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Figure 2 for LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Figure 3 for LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Figure 4 for LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Viaarxiv icon

SLAM assisted 3D tracking system for laparoscopic surgery

Add code
Sep 18, 2024
Figure 1 for SLAM assisted 3D tracking system for laparoscopic surgery
Figure 2 for SLAM assisted 3D tracking system for laparoscopic surgery
Figure 3 for SLAM assisted 3D tracking system for laparoscopic surgery
Figure 4 for SLAM assisted 3D tracking system for laparoscopic surgery
Viaarxiv icon