Papers
Topics
Authors
Recent
2000 character limit reached

Leveraging Systematic Knowledge of 2D Transformations

Published 2 Jun 2022 in cs.CV and cs.LG | (2206.00893v2)

Abstract: The existing deep learning models suffer from out-of-distribution (o.o.d.) performance drop in computer vision tasks. In comparison, humans have a remarkable ability to interpret images, even if the scenes in the images are rare, thanks to the systematicity of acquired knowledge. This work focuses on 1) the acquisition of systematic knowledge of 2D transformations, and 2) architectural components that can leverage the learned knowledge in image classification tasks in an o.o.d. setting. With a new training methodology based on synthetic datasets that are constructed under the causal framework, the deep neural networks acquire knowledge from semantically different domains (e.g. even from noise), and exhibit certain level of systematicity in parameter estimation experiments. Based on this, a novel architecture is devised consisting of a classifier, an estimator and an identifier (abbreviated as "CED"). By emulating the "hypothesis-verification" process in human visual perception, CED improves the classification accuracy significantly on test sets under covariate shift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4845–4854, 2019.
  2. Pre-training of deep rl agents for improved learning under domain randomization. arXiv preprint arXiv:2104.14386, 2021.
  3. Deepcoder: Learning to write programs. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  4. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  5. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45):18327–18332, 2013.
  6. Aware channel-wise attentive network for vehicle re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 574–575, 2020.
  7. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
  8. Emnist: Extending mnist to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2921–2926. IEEE, 2017.
  9. Where science starts: Spontaneous experiments in preschoolers’ exploratory play. Cognition, 120(3):341–349, 2011.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
  11. DreamCoder: Bootstrapping Inductive Program Synthesis with Wake-Sleep Library Learning, page 835–850. Association for Computing Machinery, New York, NY, USA, 2021.
  12. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, 1988.
  13. Michael S Gazzaniga. The split brain revisited. Scientific American, 279(1):50–55, 1998.
  14. Deep learning. MIT press, 2016.
  15. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  16. Spatial transformer networks. Advances in neural information processing systems, 28:2017–2025, 2015.
  17. Measuring the tendency of cnns to learn surface statistical regularities. arXiv preprint arXiv:1711.11561, 2017.
  18. Domain randomization for scene-specific car detection and pose estimation. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1932–1940. IEEE, 2019.
  19. Kurt Koffka. Principles of Gestalt psychology. Routledge, 2013.
  20. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  21. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  22. Building machines that learn and think like people. Behavioral and brain sciences, 40, 2017.
  23. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  24. Steven M Lehar. The world in your head: A gestalt view of the mechanism of conscious experience. Psychology Press, 2003.
  25. Weakly-supervised disentanglement without compromises. In International Conference on Machine Learning, pages 6348–6359. PMLR, 2020.
  26. On the capability of neural networks to generalize to unseen category-pose combinations. Technical report, Center for Brains, Minds and Machines (CBMM), 2020.
  27. Anthony J Marcel. Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes. Cognitive psychology, 15(2):238–300, 1983.
  28. Gary F Marcus. The algebraic mind: Integrating connectionism and cognitive science. MIT press, 2003.
  29. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  30. Telling more than we can know: Verbal reports on mental processes. Psychological review, 84(3):231, 1977.
  31. The book of why: the new science of cause and effect. Basic books, 2018.
  32. Domain randomization for active pose estimation. In 2019 International Conference on Robotics and Automation (ICRA), pages 7228–7234. IEEE, 2019.
  33. The development of gestalt perception in infancy. Infant Behavior and Development, 9:329, 1986.
  34. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
  35. Preschool children learn about causal structure from conditional interventions. Developmental science, 10(3):322–332, 2007.
  36. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):1–48, 2019.
  37. Elizabeth S Spelke. Principles of object perception. Cognitive science, 14(1):29–56, 1990.
  38. Observing the unexpected enhances infants’ learning and exploration. Science, 348(6230):91–94, 2015.
  39. Pure reasoning in 12-month-old infants as probabilistic inference. science, 332(6033):1054–1059, 2011.
  40. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
  41. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.
  42. Enaet: Self-trained ensemble autoencoding transformations for semi-supervised learning. arXiv preprint arXiv:1911.09265, 2, 2019.
  43. Seeing text in the dark: Algorithm and benchmark. arXiv preprint arXiv:2404.08965, 2024.
  44. Arbitrary-shape scene text detection via visual-relational rectification and contour approximation. IEEE Trans. Multimedia, 2022.
  45. Morphtext: Deep morphology regularized accurate arbitrary-shape scene text detection. IEEE Trans. Multimedia, 2022.
  46. Lecture2note: Automatic generation of lecture notes from slide-based educational videos. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 898–903. IEEE, 2019.
  47. Learning neurosymbolic generative models via program synthesis. In International Conference on Machine Learning, pages 7144–7153. PMLR, 2019.
  48. Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2547–2555, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.