Picture for Shuhei Kurita

Shuhei Kurita

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Add code
Oct 30, 2024
Figure 1 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 2 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 3 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 4 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Viaarxiv icon

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

Add code
Jul 28, 2024
Viaarxiv icon

Answerability Fields: Answerable Location Estimation via Diffusion Models

Add code
Jul 26, 2024
Figure 1 for Answerability Fields: Answerable Location Estimation via Diffusion Models
Figure 2 for Answerability Fields: Answerable Location Estimation via Diffusion Models
Figure 3 for Answerability Fields: Answerable Location Estimation via Diffusion Models
Figure 4 for Answerability Fields: Answerable Location Estimation via Diffusion Models
Viaarxiv icon

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Add code
Jul 04, 2024
Figure 1 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 2 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 3 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Figure 4 for LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Viaarxiv icon

CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

Add code
Jun 20, 2024
Viaarxiv icon

Map-based Modular Approach for Zero-shot Embodied Question Answering

Add code
May 26, 2024
Viaarxiv icon

Text-driven Affordance Learning from Egocentric Vision

Add code
Apr 03, 2024
Viaarxiv icon

JDocQA: Japanese Document Question Answering Dataset for Generative Language Models

Add code
Mar 28, 2024
Figure 1 for JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Figure 2 for JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Figure 3 for JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Figure 4 for JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Viaarxiv icon

Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

Add code
Feb 28, 2024
Viaarxiv icon

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

Add code
Jan 18, 2024
Viaarxiv icon