Picture for Wenliang Dai

Wenliang Dai

NVLM: Open Frontier-Class Multimodal LLMs

Add code
Sep 17, 2024
Figure 1 for NVLM: Open Frontier-Class Multimodal LLMs
Figure 2 for NVLM: Open Frontier-Class Multimodal LLMs
Figure 3 for NVLM: Open Frontier-Class Multimodal LLMs
Figure 4 for NVLM: Open Frontier-Class Multimodal LLMs
Viaarxiv icon

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

Add code
Oct 09, 2023
Viaarxiv icon

Survey of Social Bias in Vision-Language Models

Add code
Sep 24, 2023
Viaarxiv icon

Visual Instruction Tuning with Polite Flamingo

Add code
Jul 03, 2023
Viaarxiv icon

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Add code
May 11, 2023
Viaarxiv icon

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Add code
Feb 28, 2023
Viaarxiv icon

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Add code
Dec 20, 2022
Viaarxiv icon

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Add code
Oct 14, 2022
Figure 1 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 2 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 3 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 4 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Viaarxiv icon

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Add code
Jul 06, 2022
Figure 1 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 2 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 3 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 4 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Viaarxiv icon

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Add code
Mar 30, 2022
Figure 1 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 2 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 3 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 4 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Viaarxiv icon