Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 150 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 113 tok/s Pro

Kimi K2 211 tok/s Pro

GPT OSS 120B 444 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

SENSE: a Shared Encoder Network for Scene-flow Estimation (1910.12361v1)

Published 27 Oct 2019 in cs.CV

Abstract: We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation. Our key insight is that sharing features makes the network more compact, induces better feature representations, and can better exploit interactions among these tasks to handle partially labeled data. With a shared encoder, we can flexibly add decoders for different tasks during training. This modular design leads to a compact and efficient model at inference time. Exploiting the interactions among these tasks allows us to introduce distillation and self-supervised losses in addition to supervised losses, which can better handle partially labeled real-world data. SENSE achieves state-of-the-art results on several optical flow benchmarks and runs as fast as networks specifically designed for optical flow. It also compares favorably against the state of the art on stereo and scene flow, while consuming much less memory.

Citations (73)

View on Semantic Scholar

Summary

The paper introduces a novel shared encoder that unifies optical flow, disparity, occlusion, and segmentation tasks in one compact architecture.
The approach uses a modular design with a ResNet-like encoder and pyramid pooling to enhance feature extraction and task-specific performance.
Empirical results on MPI Sintel and KITTI validate SENSE's efficiency and state-of-the-art performance in scene flow estimation.

An Expert Overview of "SENSE: a Shared Encoder Network for Scene-flow Estimation"

The paper "SENSE: a Shared Encoder Network for Scene-flow Estimation" introduces a novel approach to scene flow estimation that leverages a shared encoder network to address four interconnected tasks: optical flow estimation, stereo disparity estimation, occlusion estimation, and semantic segmentation. This interdisciplinary method unifies various aspects of visual perception tasks into a single compact architecture, enhancing the model's efficiency and performance across multiple domains.

Technical Insights

The SENSE framework features a modular design that employs a shared encoder for extracting features and separate decoders for each specific task. The shared encoder is built upon a ResNet-like architecture, incorporating pyramid pooling to enhance disparity estimation and semantic segmentation. This shared encoder design reduces redundancy and allows efficient feature reuse across tasks, contributing to the compactness and effectiveness of the overall model.

In optical flow estimation, the network constructs a 2D cost volume while employing a 1D cost volume for disparity estimation, allowing the model to capture movement and disparity using tailored techniques. The encoder-decoder structure facilitates deep feature extraction for complex scene understanding tasks, such as occlusion detection and semantic segmentation, which are crucial for accurate scene flow prediction.

Numerical Results and Performance

The SENSE model demonstrates superior performance on standard benchmarks, achieving state-of-the-art results in optical flow, disparity, and scene flow estimation. The paper highlights the model's capability to perform on par with specialized networks designed solely for optical flow, while maintaining efficiency comparable to models with much lower complexity and memory consumption. The reported results on optical flow datasets, such as MPI Sintel and KITTI, validate the robustness and adaptability of this shared encoder approach.

The scene flow results on KITTI indicate that SENSE, with optional refinement modules, maintains fast inference speeds and continues to surpass other cutting-edge methods in key performance metrics. This showcases the potential of integrating semantic-level understanding with traditional pixel-correspondence techniques in elevating scene flow prediction accuracy.

Implications and Speculations

The integration of multiple tasks within a single network offers a prospect of improved performance when dealing with tasks that inherently depend on feature synergy, such as in autonomous driving applications where understanding scene dynamics is crucial. The modular nature of SENSE allows scalability and extensibility for future research, where additional tasks can be appended without significantly altering the core architecture.

The introduction of distillation and self-supervised loss functions enriches the network training process, facilitating learning from partially labeled data—a common scenario in real-world datasets. This suggests avenues for further utilizing semi-supervised approaches in tasks where labeled data is scarce or expensive to obtain, an aspect that could propel developments in unsupervised scene understanding.

Concluding Remarks

This paper presents a significant step in holistic scene understanding by demonstrating that shared feature extraction across closely related tasks can lead to better model compactness and improved predictive accuracy. The efficacy of the SENSE framework lies in its unified approach that manages complexity while extracting deep feature representations, paving the way for more versatile solutions in machine vision applications. Future investigations could entail exploring deeper architectural innovations or adaptive feature sharing strategies to enhance generalizability and real-time performance across diverse application domains.