Learning Lightweight Lane Detection CNNs by Self Attention Distillation (1908.00821v1)

Published 2 Aug 2019 in cs.CV

Abstract: Training deep models for lane detection is challenging due to the very subtle and sparse supervisory signals inherent in lane annotations. Without learning from much richer context, these models often fail in challenging scenarios, e.g., severe occlusion, ambiguous lanes, and poor lighting conditions. In this paper, we present a novel knowledge distillation approach, i.e., Self Attention Distillation (SAD), which allows a model to learn from itself and gains substantial improvement without any additional supervision or labels. Specifically, we observe that attention maps extracted from a model trained to a reasonable level would encode rich contextual information. The valuable contextual information can be used as a form of 'free' supervision for further representation learning through performing topdown and layer-wise attention distillation within the network itself. SAD can be easily incorporated in any feedforward convolutional neural networks (CNN) and does not increase the inference time. We validate SAD on three popular lane detection benchmarks (TuSimple, CULane and BDD100K) using lightweight models such as ENet, ResNet-18 and ResNet-34. The lightest model, ENet-SAD, performs comparatively or even surpasses existing algorithms. Notably, ENet-SAD has 20 x fewer parameters and runs 10 x faster compared to the state-of-the-art SCNN, while still achieving compelling performance in all benchmarks. Our code is available at https://github.com/cardwing/Codes-for-Lane-Detection.

Citations (512)

View on Semantic Scholar

Collections

Summary

The paper introduces Self Attention Distillation (SAD) to guide lightweight CNNs using internal attention maps for improved lane detection.
The method achieves state-of-the-art performance on benchmarks like CULane while reducing model parameters and speeding up inference.
The study highlights that SAD enhances attention map quality and paves the way for efficient real-time applications in autonomous driving.

Overview of "Learning Lightweight Lane Detection CNNs by Self Attention Distillation"

Introduction and Motivation

Lane detection is a critical component in autonomous driving, yet it faces significant challenges due to subtle and sparse supervisory signals inherent in lane annotations. This paper introduces a novel approach called Self Attention Distillation (SAD) to enhance the performance of lightweight Convolutional Neural Networks (CNNs) for lane detection without additional labels or supervision. The primary observation is that attention maps from a well-trained model encode useful contextual information. SAD leverages these maps for improved representation learning by performing layer-wise attention distillation within the network itself.

Methodology

The proposed method, SAD, enhances the learning capacity of lightweight models by utilizing their own attention maps as supervisory signals. SAD is compatible with any feedforward CNN and does not increase inference time, making it a practical addition to existing lane detection frameworks. The approach involves:

Attention Map Utilization: SAD employs activation-based attention maps extracted from intermediate layers of a trained model. These maps serve as targets for distillation among neighboring layers.
Layer-wise Distillation: The network's earlier layers are guided by the attention maps of deeper layers through a top-down distillation process. This layer-wise knowledge transfer enhances the network's ability to capture rich scene context.
Training and Loss Function: The total loss in training integrates segmentation, existence prediction, and distillation losses, weighted to optimize model performance.

Experimental Results

SAD was validated on three popular lane detection benchmarks: TuSimple, CULane, and BDD100K, with lightweight models such as ENet and ResNet variants. The results were noteworthy:

Performance: ENet-SAD achieved performance comparable to or exceeding that of state-of-the-art algorithms while maintaining a significantly reduced number of parameters and faster inference speed. Specifically, ENet-SAD outperformed SCNN with 20 times fewer parameters and 10 times faster execution on CULane.
Attention Map Analysis: Visualizations demonstrated that SAD improved the quality of attention maps, leading to better focus on lane-relevant features.

Implications and Future Directions

The introduction of SAD opens new avenues for enhancing the performance of CNNs in tasks requiring fine-grained attention to detail. Its integration into lightweight models offers a cost-effective solution for real-time applications like autonomous driving. The framework's ability to improve attention without additional data or supervision suggests potential adaptations for other tasks such as image saliency detection and image matting.

Further research could explore the scalability of SAD in larger, more complex networks and its application to other computer vision domains. Additionally, the impact of varying distillation paths and configurations presents an interesting area for optimization.

Conclusion

The "Learning Lightweight Lane Detection CNNs by Self Attention Distillation" paper presents a substantial contribution to the lane detection field by leveraging intra-network attention distillation. The demonstrated balance between model accuracy and computational efficiency highlights the practical viability of SAD in real-world autonomous driving systems.