Picture for Zining Wang

Zining Wang

AnalogSAGE: Self-evolving Analog Design Multi-Agents with Stratified Memory and Grounded Experience

Add code
Dec 27, 2025
Viaarxiv icon

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Add code
Dec 14, 2025
Figure 1 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 2 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 3 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 4 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Viaarxiv icon

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

Add code
Nov 19, 2025
Figure 1 for MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Figure 2 for MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Figure 3 for MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Figure 4 for MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Viaarxiv icon

SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media

Add code
Aug 25, 2025
Figure 1 for SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
Figure 2 for SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
Figure 3 for SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
Figure 4 for SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
Viaarxiv icon

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

Add code
May 30, 2025
Viaarxiv icon

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Add code
Mar 27, 2025
Figure 1 for Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Figure 2 for Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Figure 3 for Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Figure 4 for Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Viaarxiv icon

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding

Add code
Mar 18, 2025
Figure 1 for Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Figure 2 for Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Figure 3 for Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Figure 4 for Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Viaarxiv icon

A Token-level Text Image Foundation Model for Document Understanding

Add code
Mar 04, 2025
Figure 1 for A Token-level Text Image Foundation Model for Document Understanding
Figure 2 for A Token-level Text Image Foundation Model for Document Understanding
Figure 3 for A Token-level Text Image Foundation Model for Document Understanding
Figure 4 for A Token-level Text Image Foundation Model for Document Understanding
Viaarxiv icon

Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review

Add code
Feb 23, 2025
Viaarxiv icon

InstructOCR: Instruction Boosting Scene Text Spotting

Add code
Dec 20, 2024
Viaarxiv icon