Picture for Wenhai Wang

Wenhai Wang

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Viaarxiv icon

DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction

Add code
Dec 09, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Add code
Dec 02, 2024
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Figure 1 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 2 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 3 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 4 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Viaarxiv icon

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Add code
Oct 15, 2024
Figure 1 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 2 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 3 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Figure 4 for MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Viaarxiv icon

Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

Add code
Sep 13, 2024
Figure 1 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 2 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 3 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 4 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Viaarxiv icon

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

Add code
Aug 18, 2024
Figure 1 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 2 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 3 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 4 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Viaarxiv icon

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Add code
Aug 16, 2024
Figure 1 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 2 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 3 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 4 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Viaarxiv icon