Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-scale Unified Network for Image Classification (2403.18294v1)

Published 27 Mar 2024 in cs.CV

Abstract: Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to a larger size incurs digitization noise and increased computation cost. In this work, we carry out a comprehensive, layer-wise investigation of CNN models in response to scale variation, based on Centered Kernel Alignment (CKA) analysis. The observations reveal lower layers are more sensitive to input image scale variations than high-level layers. Inspired by this insight, we propose Multi-scale Unified Network (MUSN) consisting of multi-scale subnets, a unified network, and scale-invariant constraint. Our method divides the shallow layers into multi-scale subnets to enable feature extraction from multi-scale inputs, and the low-level features are unified in deep layers for extracting high-level semantic features. A scale-invariant constraint is posed to maintain feature consistency across different scales. Extensive experiments on ImageNet and other scale-diverse datasets, demonstrate that MSUN achieves significant improvements in both model performance and computational efficiency. Particularly, MSUN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Mind the pool: Convolutional neural networks can overfit input size. In The Eleventh International Conference on Learning Representations, 2023.
  2. Multi-scale representation learning on hypergraph for 3d shape retrieval and recognition. IEEE Transactions on Image Processing, 30:5327–5338, 2021.
  3. Network dissection: Quantifying interpretability of deep visual representations. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6541–6549, 2017.
  4. Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, pages 404–417. Springer, 2006.
  5. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
  6. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1872–1886, 2013.
  7. Dynamic relu. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11–20, 2020.
  8. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2018.
  9. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 801–818, 2018.
  10. Deep feature representation based imitation learning for autonomous helicopter aerobatics. IEEE Transactions on Artificial Intelligence, 2(5):437–446, 2021.
  11. Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  12. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  13. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014.
  14. An analysis of single-layer networks in unsupervised feature learning. Proceedings of the fourteenth International Conference on Artificial Intelligence and Statistics, pages 215–223, 2011.
  15. Algorithms for learning kernels based on centered alignment. The Journal of Machine Learning Research, 13(1):795–828, 2012.
  16. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  18. Decaf: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning, pages 647–655. PMLR, 2014.
  19. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915–1929, 2013.
  20. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007.
  21. Deep learning. MIT press, 2016.
  22. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  23. Multi-scale dense networks for resource efficient image classification. In International Conference on Learning Representations, 2018.
  24. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017.
  25. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Transactions on Image Processing, 29:265–276, 2020.
  26. Multiscale representation learning for image classification: A survey. IEEE Transactions on Artificial Intelligence, 4(1):23–43, 2023.
  27. Similarity of neural network representations revisited. In International Conference on Machine Learning, pages 3519–3529. PMLR, 2019.
  28. 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). IEEE, 2013.
  29. Learning multiple layers of features from tiny images. 2009.
  30. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
  31. Deep learning. Nature, 521(7553):436–444, 2015.
  32. Brisk: Binary robust invariant scalable keypoints. In 2011 International Conference on Computer Vision, pages 2548–2555. IEEE, 2011.
  33. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2852–2861, 2017.
  34. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017.
  35. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3141–3149, 2016.
  36. David G Lowe. Distinctive image features from scale-invariant keypoints. volume 60, pages 91–110. Springer, 2004.
  37. Fine-grained visual classification of aircraft. 2013.
  38. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761–767, 2004.
  39. Asift: A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, 2(2):438–469, 2009.
  40. Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Transactions on Geoscience and Remote Sensing, 58(11):7557–7569, 2020.
  41. Automated flower classification over a large number of classes. Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pages 722–729, 2008.
  42. Multi-scale patch-based image restoration. IEEE Transactions on Image Processing, 25(1):249–261, 2016.
  43. Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3498–3505. IEEE, 2012.
  44. Gradually updated neural networks for large-scale image recognition. International Conference on Machine Learning, pages 4188–4197, 2018.
  45. Improving language understanding by generative pre-training. OpenAI Blog, 1(8), 2018.
  46. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
  47. Deep scattering network with fractional wavelet transform. IEEE Transactions on Signal Processing, 69:4740–4757, 2021.
  48. Scene segmentation with dag-recurrent neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1480–1493, 2018.
  49. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
  50. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  51. Striving for simplicity: The all convolutional net. 2014.
  52. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, pages 3104–3112, 2014.
  53. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
  54. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
  55. Learning to resize images for computer vision tasks. Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 497–506, 2021.
  56. Fixing the train-test resolution discrepancy. Advances in Neural Information Processing Systems, 2019.
  57. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
  58. Bridging the gap between computational photography and visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(12):4272–4290, 2020.
  59. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3349–3364, 2021.
  60. Resolution adaptive networks for efficient inference. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8729–8738, 2018.
  61. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Transactions on Information Theory, 64(3):1845–1866, 2018.
  62. Zoom better to see clearer: Human part segmentation with auto zoom net. In Proc. of the European Conference on Computer Vision (ECCV), volume 1. Citeseer, 2016.
  63. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  64. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1492–1500, 2017.
  65. Scale-invariant convolutional neural networks. arXiv preprint arXiv:1411.6369, 2014.
  66. Multi-scale structure-aware network for weakly supervised temporal action detection. IEEE Transactions on Image Processing, 30:5848–5861, 2021.
  67. Local difference binary for ultrafast and distinctive feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1):188–194, 2014.
  68. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, 2014.
  69. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2):563–578, 2022.
  70. Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition. IEEE Transactions on Neural Networks and Learning Systems, 31(2):661–674, 2020.
  71. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
  72. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2881–2890, 2017.
  73. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing, 30:6544–6556, 2021.
  74. A transfer learning model for gesture recognition based on the deep features extracted by cnn. IEEE Transactions on Artificial Intelligence, 2(5):447–458, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wenzhuo Liu (18 papers)
  2. Fei Zhu (49 papers)
  3. Cheng-Lin Liu (71 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com