Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Sep 26, 2024

Moucheng Xu, Evangelos Chatzaroulas, Luc McCutcheon, Abdul Ahad, Hamzah Azeem, Janusz Marecki, Ammar Anwar

Share this with someone who'll enjoy it:

Abstract:A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow based on a video demonstration. SOPs are a crucial step toward automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. We explore in-context learning with video-language models for SOP generation. We report that in-context learning sometimes helps video-language models at SOP generation. We then propose an in-context ensemble learning to further enhance the capabilities of the models in SOP generation.

* multimodal in-context ensemble learning, video-language models, SOP generation, pseudo-labels, in-context learning, prompt engineering

View paper on

Share this with someone who'll enjoy it:

Title:In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

Paper and Code