Picture for Kiyoharu Aizawa

Kiyoharu Aizawa

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Add code
Oct 22, 2024
Viaarxiv icon

FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation

Add code
Sep 27, 2024
Viaarxiv icon

Training-Free Sketch-Guided Diffusion with Latent Optimization

Add code
Aug 31, 2024
Viaarxiv icon

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Add code
Jul 31, 2024
Figure 1 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 2 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 3 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 4 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Viaarxiv icon

MangaUB: A Manga Understanding Benchmark for Large Multimodal Models

Add code
Jul 26, 2024
Viaarxiv icon

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

Add code
Apr 24, 2024
Viaarxiv icon

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Add code
Mar 29, 2024
Viaarxiv icon

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Add code
Mar 24, 2024
Viaarxiv icon

Cross-Lingual Learning in Multilingual Scene Text Recognition

Add code
Dec 17, 2023
Viaarxiv icon

Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model

Add code
Dec 13, 2023
Viaarxiv icon