Compressed Image Captioning using CNN-based Encoder-Decoder Framework (2404.18062v1)
Abstract: In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image captioning is vast. It can significantly boost the accuracy of search engines, making it easier to find relevant information. Moreover, it can greatly enhance accessibility for visually impaired individuals, providing them with a more immersive experience of digital content. However, despite its promise, image captioning presents several challenges. One major hurdle is extracting meaningful visual information from images and transforming it into coherent language. This requires bridging the gap between the visual and linguistic domains, a task that demands sophisticated algorithms and models. Our project is focused on addressing these challenges by developing an automatic image captioning architecture that combines the strengths of convolutional neural networks (CNNs) and encoder-decoder models. The CNN model is used to extract the visual features from images, and later, with the help of the encoder-decoder framework, captions are generated. We also did a performance comparison where we delved into the realm of pre-trained CNN models, experimenting with multiple architectures to understand their performance variations. In our quest for optimization, we also explored the integration of frequency regularization techniques to compress the "AlexNet" and "EfficientNetB0" model. We aimed to see if this compressed model could maintain its effectiveness in generating image captions while being more resource-efficient.
- Convolutional image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11):1875–1886, 2015.
- cocodataset.org. Coco. https://cocodataset.org/, 2014.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809, 2015.
- Image captioning as an assistive technology: Lessons learned from vizwiz 2020 challenge. Journal of Artificial Intelligence Research, 73:437–459, 2022.
- Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023.
- Every picture tells a story: Generating sentences from images. pages 15–29, 2010.
- Ysabel Gerrard. Beyond the hashtag: Circumventing content moderation on social media. New Media & Society, 20(12):4492–4511, 2018.
- Deep learning approaches on image captioning: A review. ACM Computing Surveys, 56(3):1–39, 2023.
- Captionomaly: A deep learning toolbox for anomaly captioning in social surveillance systems. IEEE Transactions on Computational Social Systems, 2023.
- Vocabulary learning support system based on automatic image captioning technology. In Distributed, Ambient and Pervasive Interactions: 7th International Conference, DAPI 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Orlando, FL, USA, July 26–31, 2019, Proceedings 21, pages 346–358. Springer, 2019.
- Phrase-based image captioning. In International conference on machine learning, pages 2085–2094. PMLR, 2015.
- Environment-aware dense video captioning for iot-enabled edge cameras. IEEE Internet of Things Journal, 9(6):4554–4564, 2021.
- Visual image caption generation for service robotics and industrial applications. In 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), pages 827–832. IEEE, 2019.
- Situational awareness from social media photographs using automated image captioning. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 203–211. IEEE, 2017.
- Medical image captioning model to convey more details: Methodological comparison of feature difference generation. IEEE Access, 9:150560–150568, 2021.
- Image captioning using cnn and lstm. In 4th Smart Cities Symposium (SCS 2021), volume 2021, pages 274–277. IET, 2021.
- Image captioning: a comprehensive survey. In 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC), pages 325–328. IEEE, 2020.
- Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2015.
- Cnnpack: Packing convolutional neural networks in the frequency domain. Advances in neural information processing systems, 29, 2016.
- Frequency regularization: Reducing information redundancy in convolutional neural networks. IEEE Access, 2023.
- Md Alif Rahman Ridoy (1 paper)
- M Mahmud Hasan (1 paper)
- Shovon Bhowmick (1 paper)