Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video (1705.01759v1)

Published 4 May 2017 in cs.CV, cs.GR, and cs.MM

Abstract: Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot" -- a deep learning-based agent for piloting through 360{\deg} sports videos automatically. At each frame, the agent observes a panoramic image and has the knowledge of previously selected viewing angles. The task of the agent is to shift the current viewing angle (i.e. action) to the next preferred one (i.e., goal). We propose to directly learn an online policy of the agent from data. We use the policy gradient technique to jointly train our pipeline: by minimizing (1) a regression loss measuring the distance between the selected and ground truth viewing angles, (2) a smoothness loss encouraging smooth transition in viewing angle, and (3) maximizing an expected reward of focusing on a foreground object. To evaluate our method, we build a new 360-Sports video dataset consisting of five sports domains. We train domain-specific agents and achieve the best performance on viewing angle selection accuracy and transition smoothness compared to [51] and other baselines.

Citations (170)

View on Semantic Scholar

Summary

The paper introduces "Deep 360 Pilot", a deep learning agent combining object detection and recurrent neural networks for automatic navigation through 360° sports videos.
Evaluations on a new "360-Sports" dataset show that "Deep 360 Pilot" achieves higher accuracy and smoother viewing transitions compared to baseline and existing methods.
This research provides a robust framework for enhancing immersive multimedia experiences and interactive sports playback by automatically selecting optimal viewer perspectives.

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360-Degree Sports Videos

The paper "Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos" introduces an innovative approach toward automatic navigation of 360-degree sports videos using a deep learning-based agent. The primary aim of this research is to assist viewers in the cumbersome task of selecting optimal viewing angles as they engage with dynamic content in immersive 360-degree videos, which are increasingly becoming prevalent due to the availability of advanced multimedia capture technology.

Core Methodology

The "Deep 360 Pilot" employs a multi-step process combining object detection and recurrent neural networks (RNNs) to facilitate the automatic selection of viewing angles analogous to a human agent's navigational preferences. The process begins with leveraging a state-of-the-art object detector to identify potential areas of interest within the panoramic frames. Subsequently, an RNN-based selector identifies the main object, which serves as the anchor for determining the next preferred viewing angle. Complementing this is a regressor module that smoothly adjusts the viewing angle by considering both motion attributes of objects and historical viewing data. This advanced policy formulation is crucial as it mimics the nuanced decision-making process involved in real-time video navigation and streaming applications.

Results and Evaluation

The paper demonstrates the effectiveness of this approach by conducting comprehensive experiments using a newly compiled dataset named "360-Sports," encompassing five sports domains meticulously annotated for experimental rigor. Comparative analyses indicate that the "Deep 360 Pilot" achieves the highest accuracy and smoother transitions in viewing angles compared to established methods such as AUTOCAM and baseline models combining saliency detection with object tracking. Even when juxtaposed against a variant of itself without the regressor, the full model shows substantial improvement in viewer comfort through reduced jitteriness in angle transitions.

Implications and Future Directions

This research contributes substantially to domains involving immersive multimedia experiences, providing a robust framework for automated video navigation, potentially enhancing applications such as virtual reality rendering and interactive sports playback technologies. The utilization of deep reinforcement learning mechanisms in this context opens new pathways for intelligent video content display interfaces that dynamically personalize viewer experiences.

Future research prompted by this work might delve into domain adaptation challenges, where transferable learning approaches could enable more generalized models capable of seamless operation across diverse video genres beyond sports. The quest to further minimize annotation overhead in dataset preparation would also be a beneficial pursuit, potentially employing unsupervised or weakly supervised learning paradigms to enrich the model's applicability and precision.

In conclusion, "Deep 360 Pilot" reflects a significant leap in automated video navigation, integrating perceptual strategies with machine learning to redefine how engaging multimedia content can be consumed in modern digital landscapes. This paper serves as a pivotal resource for researchers seeking to evolve interactive media technologies with intelligent systems offering augmented viewer experiences.