Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

May 29, 2024

Simranjit Singh, Georgios Pavlakos, Dimitrios Stamoulis

Figure 1 for Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

Figure 2 for Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

Figure 3 for Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

Figure 4 for Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

Share this with someone who'll enjoy it:

Abstract:As interest in "reformulating" the 3D Visual Question Answering (VQA) problem in the context of foundation models grows, it is imperative to assess how these new paradigms influence existing closed-vocabulary datasets. In this case study, we evaluate the zero-shot performance of foundational models (GPT-4 Vision and GPT-4) on well-established 3D VQA benchmarks, namely 3D-VQA and ScanQA. We provide an investigation to contextualize the performance of GPT-based agents relative to traditional modeling approaches. We find that GPT-based agents without any fine-tuning perform on par with the closed vocabulary approaches. Our findings corroborate recent results that "blind" models establish a surprisingly strong baseline in closed-vocabulary settings. We demonstrate that agents benefit significantly from scene-specific vocabulary via in-context textual grounding. By presenting a preliminary comparison with previous baselines, we hope to inform the community's ongoing efforts to refine multi-modal 3D benchmarks.

* Accepted at 1st Workshop on Multimodalities for 3D Scenes CVPR 2024

View paper on

Share this with someone who'll enjoy it:

Title:Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

Paper and Code