On Complex Valued Convolutional Neural Networks (1602.09046v1)

Published 29 Feb 2016 in cs.NE

Abstract: Convolutional neural networks (CNNs) are the cutting edge model for supervised machine learning in computer vision. In recent years CNNs have outperformed traditional approaches in many computer vision tasks such as object detection, image classification and face recognition. CNNs are vulnerable to overfitting, and a lot of research focuses on finding regularization methods to overcome it. One approach is designing task specific models based on prior knowledge. Several works have shown that properties of natural images can be easily captured using complex numbers. Motivated by these works, we present a variation of the CNN model with complex valued input and weights. We construct the complex model as a generalization of the real model. Lack of order over the complex field raises several difficulties both in the definition and in the training of the network. We address these issues and suggest possible solutions. The resulting model is shown to be a restricted form of a real valued CNN with twice the parameters. It is sensitive to phase structure, and we suggest it serves as a regularized model for problems where such structure is important. This suggestion is verified empirically by comparing the performance of a complex and a real network in the problem of cell detection. The two networks achieve comparable results, and although the complex model is hard to train, it is significantly less vulnerable to overfitting. We also demonstrate that the complex network detects meaningful phase structure in the data.

Citations (127)

View on Semantic Scholar

Summary

The paper introduces a novel complex valued CNN framework that leverages phase information to enhance robustness and reduce overfitting.
It employs Wirtinger calculus and magnitude-based pooling to adapt traditional CNN components for complex number operations.
Empirical tests on cell detection reveal that complex CNNs achieve comparable accuracy with improved test performance over real-valued networks.

Complex Valued Convolutional Neural Networks: A Theoretical and Empirical Exploration

The paper "On Complex Valued Convolutional Neural Networks" by Nitzan Guberman presents a novel approach to designing convolutional neural networks (CNNs) using complex numbers, motivated by the inherent properties of natural images and previous insights on signal representation using complex numbers. This research addresses both theoretical formulation and empirical evaluation, with a focus on decreasing the vulnerability of CNNs to overfitting, thereby enhancing model generalization.

Theoretical Foundations and Model Construction

The paper establishes a grounded approach to complex CNNs through a comprehensive theoretical framework. The author proposes the use of complex numbers for input data and network weights, leveraging the potential of complex representations to capture phase information in images—a characteristic often ignored in real-valued networks. By generalizing traditional CNN components to the complex domain, the paper attempts to exploit beneficial properties of complex signals, such as synchronization and signal coherence, to improve CNN robustness.

Key challenges arise due to the non-orderable nature of complex numbers, which complicates the definition of activation functions like ReLU and operations such as max pooling. The paper suggests using variations such as phase-sensitive activation functions and magnitude-based pooling to adapt these mechanisms to complex numbers effectively. Wirtinger calculus is employed to define gradients for backpropagation in complex-valued settings, providing a method to train these networks using gradient descent techniques.

Regularization through Complex Architecture

The proposed complex CNN is viewed as a restricted form of a real-valued CNN, featuring half the number of parameters in its real counterpart. This restriction naturally serves as a form of regularization, potentially reducing overfitting by limiting the hypothesis space to models that inherently capture and utilize phase structures in the data. Complex CNNs are therefore posited to perform well on problems where phase information is crucial, positioning them as a specialized tool rather than a universal replacement for traditional CNNs.

Empirical Evaluation on Cell Detection

The empirical assessment is conducted on a dataset of simulated fluorescence microscopy images to perform cell detection from image patches. The experimental setup involves constructing gradient images from these patches, which encode rich phase information suitable for evaluation with the proposed complex CNN model.

Guberman empirically demonstrates that while both complex and real networks achieve comparable classification accuracies, the complex CNN shows a significant resistance to overfitting. The real network, despite achieving superior accuracy on the training set, exhibits higher test losses, indicative of overfitting. This aligns with the hypothesis that complex networks inherently regularize through their architectural constraints.

Additionally, the author highlights computational difficulties in training complex CNNs due to convergence issues and sensitivity to initialization—a notable limitation that requires further investigation and potential methodological advancements.

Future Directions and Implications

This work opens avenues for further exploration into the application of complex CNNs, particularly in domains where phase information is paramount. Future investigations could seek more efficient training algorithms that better handle the numerical instability observed in complex CNN training processes. By refining these models and expanding their application to different types of data, such as optical flow or audio signals, researchers can systematically evaluate the broader applicability and utility of complex architectures in supervised learning contexts.

Overall, while this paper ventures into a relatively novel yet challenging area, it lays a promising foundation for the potential benefits that complex-valued architectures can bring to deep learning models, especially under scenarios involving rich, phase-structured data.

PDF Markdown