Picture for Koki Maeda

Koki Maeda

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Add code
Oct 30, 2024
Viaarxiv icon

COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark

Add code
Aug 05, 2024
Viaarxiv icon

Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

Add code
Feb 28, 2024
Viaarxiv icon