Is Second-order Information Helpful for Large-scale Visual Recognition? (1703.08050v3)

Published 23 Mar 2017 in cs.CV

Abstract: By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPN-COV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV

Authors (4)

Peihua Li (18 papers)
Jiangtao Xie (10 papers)
Qilong Wang (34 papers)
Wangmeng Zuo (279 papers)

Citations (245)

View on Semantic Scholar

Summary

The paper presents MPN-COV, which harnesses second-order statistics to boost feature discrimination in convolutional networks.
It employs covariance pooling and matrix power normalization, achieving up to 4% improvement in top-1 error rates on models like AlexNet, VGG-M, and ResNet.
The study bridges statistical and geometric feature representations, suggesting a path toward more efficient and robust network architectures.

Second-order Information in Large-scale Visual Recognition

The paper "Is Second-order Information Helpful for Large-scale Visual Recognition?" by Peihua Li et al. explores the utilization of second-order statistics in convolutional networks (ConvNets) for visual recognition. The work presents the Matrix Power Normalized Covariance (MPN-COV) method, which marks a shift from the conventional reliance on first-order pooling techniques. This research addresses the underexplored potential of higher-order feature statistics in ConvNet architectures, aiming to bolster the network's discriminatory capabilities without merely increasing depth or width.

Methodological Innovations

MPN-COV differentiates itself by employing covariance pooling of high-level convolutional features, rather than traditional first-order methods. The technique faces challenges in both robust covariance estimation in the presence of high-dimensional, small-sample data and the incorporation of the manifold structure of covariance matrices. To navigate these challenges, the authors have developed forward and backward propagation formulas for the non-linear matrix functions integral to MPN-COV, thereby enabling end-to-end training. The choice of the Matrix Power Normalization is shown to approximate Riemannian geometry effectively. This is contrasted with the Log-Euclidean metric, highlighting MPN-COV's utility in preserving the significance of eigenvalues within the feature distribution.

Empirical Evaluation

The paper reports gains in recognition accuracy, with notable improvements on models such as AlexNet, VGG-M, and ResNet. For instance, MPN-COV demonstrates over 4%, 3%, and 2.5% enhancements in top-1 error rates on the ImageNet 2012 validation set compared to baseline architectures. The 50-layer ResNet with MPN-COV outperforms the 101-layer ResNet and exhibits comparable performance to the 152-layer variant. Such results underscore the efficiency of second-order statistics in enhancing feature representation capabilities of ConvNets on large-scale datasets.

Theoretical and Practical Implications

Theoretically, the work bridges statistical and geometric properties of feature representations by leveraging robust covariance estimation techniques, like the von Neumann divergence-based maximum likelihood estimation. It further establishes connections between classical estimator regularization and MPN-COV's approach, affirming the shrinkage principle's relevance in ConvNets. Practically, MPN-COV presents an informed alternative to simply increasing network complexity. This could prompt the development of more computationally efficient architectures, potentially expanding the applicability of ConvNets in resource-constrained environments.

Future Directions

The paper leaves several avenues for future exploration. These include extending MPN-COV to varied architectural frameworks, such as the Inception models, and investigating its impact in diverse applications beyond image classification, like object detection and scene categorization. Additionally, understanding how MPN-COV interacts with different types of convolutional operations and its behavior across diverse datasets could yield further insights.

In summary, this paper presents a rigorous examination of second-order feature statistics' role in large-scale visual recognition, offering empirical evidence of their effectiveness. The MPN-COV method stands out as a viable pathway to enhance ConvNet performance by leveraging higher-order statistics, indicating a promising direction for future AI research.

PDF Markdown