Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding (2312.02549v1)

Published 5 Dec 2023 in cs.CV and cs.CL

Abstract: Temporal Language Grounding seeks to localize video moments that semantically correspond to a natural language query. Recent advances employ the attention mechanism to learn the relations between video moments and the text query. However, naive attention might not be able to appropriately capture such relations, resulting in ineffective distributions where target video moments are difficult to separate from the remaining ones. To resolve the issue, we propose an energy-based model framework to explicitly learn moment-query distributions. Moreover, we propose DemaFormer, a novel Transformer-based architecture that utilizes exponential moving average with a learnable damping factor to effectively encode moment-query inputs. Comprehensive experiments on four public temporal language grounding datasets showcase the superiority of our methods over the state-of-the-art baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Thong Nguyen (38 papers)
  2. Xiaobao Wu (43 papers)
  3. Xinshuai Dong (25 papers)
  4. Cong-Duy Nguyen (16 papers)
  5. See-Kiong Ng (103 papers)
  6. Luu Anh Tuan (55 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.