Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal (1712.02478v1)

Published 7 Dec 2017 in cs.CV

Abstract: Understanding shadows from a single image spontaneously derives into two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually improved benefits from each other. Our framework is based on a novel STacked Conditional Generative Adversarial Network (ST-CGAN), which is composed of two stacked CGANs, each with a generator and a discriminator. Specifically, a shadow image is fed into the first generator which produces a shadow detection mask. That shadow image, concatenated with its predicted mask, goes through the second generator in order to recover its shadow-free image consequently. In addition, the two corresponding discriminators are very likely to model higher level relationships and global scene characteristics for the detected shadow region and reconstruction via removing shadows, respectively. More importantly, for multi-task learning, our design of stacked paradigm provides a novel view which is notably different from the commonly used one as the multi-branch version. To fully evaluate the performance of our proposed framework, we construct the first large-scale benchmark with 1870 image triplets (shadow image, shadow mask image, and shadow-free image) under 135 scenes. Extensive experimental results consistently show the advantages of ST-CGAN over several representative state-of-the-art methods on two large-scale publicly available datasets and our newly released one.

Citations (369)

View on Semantic Scholar

Summary

The paper presents an end-to-end framework that jointly optimizes shadow detection and removal using a stacked CGAN architecture.
The method employs dual generator-discriminator pairs to leverage inter-task information and capture global scene context.
Empirical results on ISTD, SBU, and UCF datasets reveal significant reductions in error rates and improved RMSE over existing methods.

Overview of Stacked Conditional Generative Adversarial Networks for Shadow Processing

This paper proposes a novel framework known as the Stacked Conditional Generative Adversarial Network (ST-CGAN) that addresses two crucial aspects of shadow processing in images: shadow detection and shadow removal. The innovation lies in the joint approach, incorporating both tasks into a singular, cohesive system. This multi-task perspective allows for leveraging the mutual benefits between shadow detection and shadow removal, a methodology not previously undertaken in the existing literature.

The system is comprised of two interconnected CGANs, with each encompassing a generator-discriminator pair. The shadow detection task is managed by the first CGAN, generating a shadow mask from an input shadow image. This output serves as input, alongside the original image, to the second CGAN responsible for removing the shadows and reconstructing the shadow-free image. The implications of utilizing two discriminators ensure that higher-level relationships and global scene characteristics are understood, which aids in the accurate execution of both tasks.

Key Contributions

The paper introduces several noteworthy contributions:

End-to-End Framework: The proposed ST-CGAN is an end-to-end framework that concurrently trains and optimizes shadow detection and removal operations. This integration allows the system to improve shadow understanding by internalizing global scene characteristics essential for these tasks.
Stacked Joint Learning Paradigm: The architecture features a unique stacked paradigm differing significantly from traditional multi-branch models. The design is inspired by DenseNet connectivity patterns, which efficiently utilize the outputs of preceding tasks as inputs for subsequent tasks, promoting progressive task enhancement and mutual reinforcement.
Benchmark Dataset: To advance and evaluate the proposed framework, a comprehensive dataset termed ISTD has been introduced, featuring a vast collection of 1,870 image triplets comprising shadow images, shadow masks, and shadow-free images across diverse scenes.

Numerical and Empirical Insights

The ST-CGAN exhibits impressive performance across several datasets, including newly introduced and publicly available ones. Notably, the system achieved a significant reduction in Balance Error Rate (BER) for shadow detection when trained and evaluated across datasets like SBU and UCF. Additionally, for shadow removal, the framework demonstrated substantial improvements, achieving lower RMSE values compared to state-of-the-art methods, highlighting its capability to remove shadows accurately without severely affecting the non-shadow components of an image.

Implications and Future Directions

The results underscore the advantages of integrating detection and removal tasks within a unified system, particularly how each component's performance can benefit from shared data representations and end-to-end learning strategies. The application of ST-CGAN in practical settings, such as enhancing visual quality for downstream computer vision tasks like object detection or scene understanding, could be substantial.

Looking forward, this work sets a foundation for investigating further elaborations in multi-task learning paradigms beyond shadow processing, potentially extending to other multi-faceted visual problems within AI research.

Conclusively, the proposed framework introduces a robust pathway for comprehensive shadow processing, supporting the concept that intertwined image-processing tasks can achieve heightened performance through cohesive learning architectures.

PDF Markdown