Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ishaan Singh Rawal

LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Aug 28, 2024

Ruirui Chen, Weifeng Jiang, Chengwei Qin, Ishaan Singh Rawal, Cheston Tan, Dongkyu Choi, Bo Xiong, Bo Ai

Figure 1 for LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Figure 2 for LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Figure 3 for LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Figure 4 for LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

Abstract:The rapid obsolescence of information in Large Language Models (LLMs) has driven the development of various techniques to incorporate new facts. However, existing methods for knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these challenges, this paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo), a straitforward and effective method that merges the explicit knowledge representation of Knowledge Graphs (KGs) with the linguistic flexibility of LLMs. Beyond merely leveraging LLMs for question answering, GMeLLo employs these models to convert free-form language into structured queries and fact triples, facilitating seamless interaction with KGs for rapid updates and precise multi-hop reasoning. Our results show that GMeLLo significantly surpasses current state-of-the-art knowledge editing methods in the multi-hop question answering benchmark, MQuAKE, especially in scenarios with extensive knowledge edits.

Via

Access Paper or Ask Questions

Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

Jun 15, 2023

Ishaan Singh Rawal, Shantanu Jaiswal, Basura Fernando, Cheston Tan

Figure 1 for Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

Figure 2 for Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

Figure 3 for Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

Figure 4 for Revealing the Illusion of Joint Multimodal Understanding in VideoQA Models

Abstract:While VideoQA Transformer models demonstrate competitive performance on standard benchmarks, the reasons behind their success remain unclear. Do these models jointly capture and leverage the rich multimodal structures and dynamics from video and text? Or are they merely exploiting shortcuts to achieve high scores? We analyze this with $\textit{QUAG}$ (QUadrant AveraGe), a lightweight and non-parametric probe that systematically ablates the model's coupled multimodal understanding during inference. Surprisingly, QUAG reveals that the models manage to maintain high performance even when injected with multimodal sub-optimality. Additionally, even after replacing self-attention in multimodal fusion blocks with "QUAG-attention", a simplistic and less-expressive variant of self-attention, the models maintain high performance. This means that current VideoQA benchmarks and their metrics do not penalize shortcuts that discount joint multimodal understanding. Motivated by this, we propose the $\textit{CLAVI}$ (Counterfactual in LAnguage and VIdeo) benchmark, a diagnostic dataset for benchmarking coupled multimodal understanding in VideoQA through counterfactuals. CLAVI consists of temporal questions and videos that are augmented to curate balanced counterfactuals in language and video domains. Hence, it incentivizes, and identifies the reliability of learnt multimodal representations. We evaluate CLAVI and find that models achieve high performance on multimodal shortcut instances, but have very poor performance on the counterfactuals. Hence, we position CLAVI as a litmus test to identify, diagnose and improve the sub-optimality of learnt multimodal VideoQA representations which the current benchmarks are unable to assess.

Via

Access Paper or Ask Questions