Compact Bilinear Pooling (1511.06062v2)

Published 19 Nov 2015 in cs.CV

Abstract: Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Experimentation illustrate the utility of the proposed representations for image classification and few-shot learning across several datasets.

PDF Abstract

Compact Bilinear Pooling

Bilinear models have demonstrated significant prowess in a variety of visual tasks like semantic segmentation, fine-grained recognition, and face recognition. The fundamental drawback of these models lies in their high-dimensional feature representations, often reaching hundreds of thousands to millions of dimensions, which render them impractical for further processing and analysis. The authors introduce two compact bilinear representations — Random Maclaurin (RM) and Tensor Sketch (TS) — which preserve the discriminative power of full bilinear representations but significantly reduce the feature dimensionality to a few thousand dimensions.

Methodological Contributions

The paper presents substantial methodological contributions:

Compact Bilinear Pooling Methods: Two novel compact bilinear pooling methods, RM and TS, are introduced, which can reduce feature dimensionality by three orders of magnitude with minimal loss in discriminative power compared to full bilinear pooling.
Efficient Back-Propagation: The compact bilinear representation allows for the end-to-end optimization via back-propagation through the entire visual recognition pipeline.
Kernelized Viewpoint: A kernelized analysis of bilinear pooling is provided, offering theoretical insights and a foundation for further exploration in compact pooling methods.

Implementation and Performance

The authors implement their compact bilinear pooling methods within convolutional neural networks (CNNs) for image classification. They conduct experiments across several datasets, including CUB-200-2011, MIT Indoor Scene Recognition, and Describable Texture Dataset (DTD), comparing their methods against full bilinear pooling, fully connected layers, and improved Fisher Vector encoding.

Dimensionality and Fine-tuning

Experiments reveal that:

Fully connected layers and Fisher Vector encoding are outperformed by bilinear and compact bilinear pooling methods by a significant margin.
TS with more than 8,000 dimensions achieves performance equivalent to the full bilinear representation with 250,000 dimensions, indicating a substantial reduction in redundancy.
Fine-tuning the projection parameters provides slight performance improvements, especially at lower dimensions.

Cross-dataset Comparison

Compact bilinear pooling methods generalize well across a variety of image recognition tasks:

On CUB-200-2011 for fine-grained visual categorization, TS pooling attains performance on par with full bilinear pooling after fine-tuning.
For scene recognition on the MIT dataset, the TS method outperforms improved Fisher Vector encoding.
In the texture classification domain (DTD), TS consistently achieves lower error rates compared to competing methods.

Few-shot Learning

Few-shot learning experiments indicate that the compact bilinear representations are particularly effective:

TS pooling achieves a 22.8% relative improvement in classification performance over full bilinear pooling when limited to one training sample per class.
The compact representation continues to excel as the number of training samples increases, highlighting its suitability for scenarios with limited labeled data.

Implications and Future Directions

The compact bilinear pooling methods introduced offer multiple practical advantages:

They substantially reduce memory and storage requirements, making them suitable for deployment in memory-constrained environments, such as embedded systems.
The reduced feature dimensionality facilitates efficient storage and retrieval in image databases, crucial for image retrieval applications.
The potential for further model refinements incorporating alternative kernel functions within deep learning frameworks opens avenues for enhancing various visual recognition tasks.

In conclusion, the paper presents compelling evidence that compact bilinear pooling methods, particularly TS, offer a robust and efficient alternative to full bilinear pooling, maintaining high discriminative power while dramatically reducing dimensionality. This work lays a solid foundation for future exploration and practical application of compact bilinear models in both theoretical and applied AI.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Yang Gao (761 papers)
Oscar Beijbom (15 papers)
Ning Zhang (278 papers)
Trevor Darrell (324 papers)

Citations (768)

View on Semantic Scholar

Compact Bilinear Pooling (1511.06062v2)