Papers
Topics
Authors
Recent
2000 character limit reached

In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding (2409.15867v5)

Published 24 Sep 2024 in cs.AI

Abstract: A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-LLMs offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-LLMs face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-LLMs for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-LLMs to generate more temporally accurate SOP, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-LLMs in SOP generation.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.