Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nelson Higuera

Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets

Oct 12, 2024

Thomas Eiter, Jan Hadl, Nelson Higuera, Johannes Oetsch

Abstract:Visual Question Answering (VQA) is the task of answering a question about an image and requires processing multimodal input and reasoning to obtain the answer. Modular solutions that use declarative representations within the reasoning component have a clear advantage over end-to-end trained systems regarding interpretability. The downside is that crafting the rules for such a component can be an additional burden on the developer. We address this challenge by presenting an approach for declarative knowledge distillation from Large Language Models (LLMs). Our method is to prompt an LLM to extend an initial theory on VQA reasoning, given as an answer-set program, to meet the requirements of the VQA task. Examples from the VQA dataset are used to guide the LLM, validate the results, and mend rules if they are not correct by using feedback from the ASP solver. We demonstrate that our approach works on the prominent CLEVR and GQA datasets. Our results confirm that distilling knowledge from LLMs is in fact a promising direction besides data-driven rule learning approaches.

* Presented at NeLaMKRR@KR, 2024 (arXiv:2410.05339)

Via

Access Paper or Ask Questions

A Neuro-Symbolic ASP Pipeline for Visual Question Answering

May 16, 2022

Thomas Eiter, Nelson Higuera, Johannes Oetsch, Michael Pritz

Figure 1 for A Neuro-Symbolic ASP Pipeline for Visual Question Answering

Figure 2 for A Neuro-Symbolic ASP Pipeline for Visual Question Answering

Figure 3 for A Neuro-Symbolic ASP Pipeline for Visual Question Answering

Figure 4 for A Neuro-Symbolic ASP Pipeline for Visual Question Answering

Abstract:We present a neuro-symbolic visual question answering (VQA) pipeline for CLEVR, which is a well-known dataset that consists of pictures showing scenes with objects and questions related to them. Our pipeline covers (i) training neural networks for object classification and bounding-box prediction of the CLEVR scenes, (ii) statistical analysis on the distribution of prediction values of the neural networks to determine a threshold for high-confidence predictions, and (iii) a translation of CLEVR questions and network predictions that pass confidence thresholds into logic programs so that we can compute the answers using an ASP solver. By exploiting choice rules, we consider deterministic and non-deterministic scene encodings. Our experiments show that the non-deterministic scene encoding achieves good results even if the neural networks are trained rather poorly in comparison with the deterministic approach. This is important for building robust VQA systems if network predictions are less-than perfect. Furthermore, we show that restricting non-determinism to reasonable choices allows for more efficient implementations in comparison with related neuro-symbolic approaches without loosing much accuracy. This work is under consideration for acceptance in TPLP.

* Paper presented at the 38th International Conference on Logic Programming (ICLP 2022), 15 pages

Via

Access Paper or Ask Questions