Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Dec 07, 2023

Ying Wang, Yanlai Yang, Mengye Ren

Figure 1 for LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Figure 2 for LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Figure 3 for LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Figure 4 for LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Share this with someone who'll enjoy it:

Abstract:The egocentric video natural language query (NLQ) task involves localizing a temporal window in an egocentric video that provides an answer to a posed query, which has wide applications in building personalized AI assistants. Prior methods for this task have focused on improvements of network architecture and leveraging pre-training for enhanced image and video features, but have struggled with capturing long-range temporal dependencies in lengthy videos, and cumbersome end-to-end training. Motivated by recent advancements in Large Language Models (LLMs) and vision language models, we introduce LifelongMemory, a novel framework that utilizes multiple pre-trained models to answer queries from extensive egocentric video content. We address the unique challenge by employing a pre-trained captioning model to create detailed narratives of the videos. These narratives are then used to prompt a frozen LLM to generate coarse-grained temporal window predictions, which are subsequently refined using a pre-trained NLQ model. Empirical results demonstrate that our method achieves competitive performance against existing supervised end-to-end learning methods, underlining the potential of integrating multiple pre-trained multimodal large language models in complex vision-language tasks. We provide a comprehensive analysis of key design decisions and hyperparameters in our pipeline, offering insights and practical guidelines.

View paper on

Share this with someone who'll enjoy it:

Title:LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Paper and Code