Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 110 tok/s Pro
GPT OSS 120B 470 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer Learning (2402.12736v1)

Published 20 Feb 2024 in cs.CV and cs.AI

Abstract: Achieving a universally high accuracy in object detection is quite challenging, and the mainstream focus in the industry currently lies on detecting specific classes of objects. However, deploying one or multiple object detection networks requires a certain amount of GPU memory for training and storage capacity for inference. This presents challenges in terms of how to effectively coordinate multiple object detection tasks under resource-constrained conditions. This paper introduces a lightweight fine-tuning strategy called Calibration side tuning, which integrates aspects of adapter tuning and side tuning to adapt the successful techniques employed in transformers for use with ResNet. The Calibration side tuning architecture that incorporates maximal transition calibration, utilizing a small number of additional parameters to enhance network performance while maintaining a smooth training process. Furthermore, this paper has conducted an analysis on multiple fine-tuning strategies and have implemented their application within ResNet, thereby expanding the research on fine-tuning strategies for object detection networks. Besides, this paper carried out extensive experiments using five benchmark datasets. The experimental results demonstrated that this method outperforms other compared state-of-the-art techniques, and a better balance between the complexity and performance of the finetune schemes is achieved.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Gaia: A transfer learning system of object detection that fits your needs, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 274–283.
  2. End-to-end object detection with transformers, in: European conference on computer vision, Springer. pp. 213–229.
  3. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 .
  4. Meta-tuning loss functions and data augmentation for few-shot object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7339–7349.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
  6. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338.
  7. Generalized few-shot object detection without forgetting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4527–4536.
  8. Discriminative fisher embedding dictionary transfer learning for object recognition. IEEE Transactions on Neural Networks and Learning Systems .
  9. A survey of quantization methods for efficient neural network inference, in: Low-Power Computer Vision. Chapman and Hall/CRC, pp. 291–326.
  10. Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587.
  11. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463 .
  12. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  13. Parameter-efficient transfer learning for nlp, in: International Conference on Machine Learning, PMLR. pp. 2790–2799.
  14. Searching for mobilenetv3, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324.
  15. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 .
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 .
  17. Cross-domain weakly-supervised object detection through progressive domain adaptation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5001–5009.
  18. Visual prompt tuning, in: European Conference on Computer Vision, Springer. pp. 709–727.
  19. Label, verify, correct: A simple few shot object detection method, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14237–14247.
  20. Afrocentric (african) crop dataset. https://www.kaggle.com/datasets/responsibleailab/crop-disease-ghana/data. Accessed: 1 20, 2024.
  21. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 .
  22. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 .
  23. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 .
  24. Cross-people mobile-phone based airwriting character recognition, in: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE. pp. 3027–3033.
  25. Microsoft coco: Common objects in context, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer. pp. 740–755.
  26. Improving convolutional networks with self-calibrated convolutions, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10096–10105.
  27. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55, 1–35.
  28. Ssd: Single shot multibox detector, in: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer. pp. 21–37.
  29. Multiclass confidence and localization calibration for object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19734–19743.
  30. You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788.
  31. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28.
  32. Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520.
  33. Divide-and-conquer checkpointing for arbitrary programs with no user annotation. Optimization Methods and Software 33, 1288–1330.
  34. Sparse r-cnn: End-to-end object detection with learnable proposals, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14454–14463.
  35. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems 35, 12991–13005.
  36. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5227–5237.
  37. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems 34, 24193–24205.
  38. Qbox: Partial transfer learning with active querying for object detection. IEEE Transactions on Neural Networks and Learning Systems .
  39. Cut and learn for unsupervised object detection and instance segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3124–3134.
  40. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957 .
  41. Detecting everything in the open world: Towards universal object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11433–11443.
  42. Multi-scale positive sample refinement for few-shot object detection, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, Springer. pp. 456–472.
  43. Wider face: A face detection benchmark, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533.
  44. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 .
  45. Confidence-aware multi-teacher knowledge distillation, in: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 4498–4502.
  46. Side-tuning: a baseline for network adaptation via additive side networks, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, Springer. pp. 698–714.
  47. Delving into the effectiveness of receptive fields: Learning scale-transferrable architectures for practical object detection. International Journal of Computer Vision 130, 970–989.
  48. Representation learning for visual object tracking by masked appearance transfer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18696–18705.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets