ASC: Adaptive Scale Feature Map Compression for Deep Neural Network (2312.08176v1)
Abstract: Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in 4$\times$ and up to 7.69$\times$ compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a 32$\times$ throughput increase meets the theoretical bandwidth of DDR5-6400 at just 7.65$\times$ the hardware cost.
- L. Cavigelli, G. Rutishauser, and L. Benini, “EBPC: Extended bit-plane compression for deep neural network inference and training accelerators,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 4, pp. 723–734, 2019.
- B.-K. Yan and S.-J. Ruan, “Area efficient compression for floating-point feature maps in convolutional neural network accelerators,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 2, pp. 746–750, 2023.
- C. Xie et al., “Deep neural network interlayer feature map compression based on least-squares fitting,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2022, pp. 3398–3402.
- Y. Kim, J.-S. Choi, and M. Kim, “A real-time convolutional neural network for super-resolution on FPGA with applications to 4k UHD 60 fps video services,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2521–2534, 2018.
- S. E. Finder et al., “Wavelet feature maps compression for image-to-image CNNs,” arXiv preprint arXiv:2205.12268, 2022.
- Z. Shao et al., “Memory-efficient CNN accelerator based on interlayer feature map compression,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 2, pp. 668–681, 2021.
- K. Iourcha, K. Nayak, and Z. Hong, “System and method for fixed-rate block-based image compression with inferred pixel values,” US Patent 5,956,431, 1999.
- A. Aimar et al., “NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 3, pp. 644–656, 2018.
- R. Wightman, “Pytorch image models,” https://github.com/rwightman/pytorch-image-models, 2019.
- M. Everingham et al., “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, pp. 98–136, 2015.
- L.-C. Chen et al., “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- B. Lim et al., “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
- E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
- “DDR5 SDRAM,” https://en.wikipedia.org/wiki/DDR5__\__SDRAM, [Accessed 02-11-2023].