Picture for Wenhai Wang

Wenhai Wang

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Viaarxiv icon

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Add code
Oct 15, 2024
Viaarxiv icon

Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

Add code
Sep 13, 2024
Figure 1 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 2 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 3 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Figure 4 for Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori
Viaarxiv icon

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

Add code
Aug 18, 2024
Figure 1 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 2 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 3 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Figure 4 for Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Viaarxiv icon

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Add code
Aug 16, 2024
Viaarxiv icon

Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Add code
Aug 14, 2024
Viaarxiv icon

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Add code
Jul 22, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization

Add code
Jun 17, 2024
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon