Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning (2004.00163v2)

Published 31 Mar 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bag's label is known, the main challenge is assigning which key instances within the bag to trigger the bag's label. Most previous models use attention-based approaches applying attentions to generate the bag's representation from instances, and then train it via the bag's classification. These models, however, implicitly violate the MIL assumption that instances in negative bags should be uniformly negative. In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework. We derive two pseudo-label generation schemes to model the E and M process and iteratively optimize the likelihood lower bound. We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions. It achieves state-of-the-art performance on two standard benchmarks, THUMOS14 and ActivityNet1.2.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhekun Luo (3 papers)
  2. Devin Guillory (10 papers)
  3. Baifeng Shi (17 papers)
  4. Wei Ke (40 papers)
  5. Fang Wan (44 papers)
  6. Trevor Darrell (324 papers)
  7. Huijuan Xu (30 papers)
Citations (115)

Summary

We haven't generated a summary for this paper yet.