Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them (2403.14224v1)

Published 21 Mar 2024 in cs.NE

Abstract: Traditional approaches to neuroevolution often start from scratch. This becomes prohibitively expensive in terms of computational and data requirements when targeting modern, deep neural networks. Using a warm start could be highly advantageous, e.g., using previously trained networks, potentially from different sources. This moreover enables leveraging the benefits of transfer learning (in particular vastly reduced training effort). However, recombining trained networks is non-trivial because architectures and feature representations typically differ. Consequently, a straightforward exchange of layers tends to lead to a performance breakdown. We overcome this by matching the layers of parent networks based on their connectivity, identifying potential crossover points. To correct for differing feature representations between these layers we employ stitching, which merges the networks by introducing new layers at crossover points. To train the merged network, only stitching layers need to be considered. New networks can then be created by selecting a subnetwork by choosing which stitching layers to (not) use. Assessing their performance is efficient as only their evaluation on data is required. We experimentally show that our approach enables finding networks that represent novel trade-offs between performance and computational cost, with some even dominating the original networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. [n.d.]. COCO - Common Objects in Context. https://cocodataset.org/#home
  2. [n.d.]. The PASCAL Visual Object Classes Homepage. http://host.robots.ox.ac.uk/pascal/VOC/
  3. Git Re-Basin: Merging Models modulo Permutation Symmetries. https://doi.org/10.48550/arXiv.2209.04836 arXiv:2209.04836 [cs]
  4. Getting the Most out of Additional Guidance Information in Deformable Image Registration by Leveraging Multi-Objective Optimization. In Medical Imaging 2015: Image Processing, Vol. 9413. SPIE, 469–475. https://doi.org/10.1117/12.2081438
  5. Revisiting Model Stitching to Compare Neural Representations. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 225–236. https://proceedings.neurips.cc/paper/2021/hash/01ded4259d101feb739b06c399e9cd9c-Abstract.html
  6. Once for All: Train One Network and Specialize It for Efficient Deployment. In Eighth International Conference on Learning Representations. https://iclr.cc/virtual_2020/poster_HylxE1HKwS.html
  7. Rich Caruana. 1997. Multitask Learning. Machine Learning 28, 1 (July 1997), 41–75. https://doi.org/10.1023/A:1007379606734
  8. Rethinking Atrous Convolution for Semantic Image Segmentation. https://doi.org/10.48550/arXiv.1706.05587 arXiv:1706.05587 [cs]
  9. Parameterless Gene-pool Optimal Mixing Evolutionary Algorithms. Evolutionary Computation (June 2023), 1–28. https://doi.org/10.1162/evco_a_00338
  10. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (June 2010), 303–338.
  11. Michael P. Fay and Michael A. Proschan. 2010. Wilcoxon-Mann-Whitney or t-Test? On Assumptions for Hypothesis Tests and Multiple Interpretations of Decision Rules. Statistics Surveys 4 (2010), 1–39. https://doi.org/10.1214/09-SS051
  12. The Impact of Asynchrony on Parallel Model-Based EAs. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 910–918. https://doi.org/10.1145/3583131.3590406
  13. Solving Multi-Structured Problems by Introducing Linkage Kernels into GOMEA. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 703–711. https://doi.org/10.1145/3512290.3528828
  14. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html
  15. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
  16. Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70. arXiv:4615733 https://www.jstor.org/stable/4615733
  17. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs] (Jan. 2017). arXiv:1412.6980 [cs] http://arxiv.org/abs/1412.6980
  18. Similarity of Neural Network Representations Revisited. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 3519–3529. https://proceedings.mlr.press/v97/kornblith19a.html
  19. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014 (Lecture Notes in Computer Science), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
  20. Improving the Performance of MO-RV-GOMEA on Problems with Many Objectives Using Tchebycheff Scalarizations. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’18). Association for Computing Machinery, New York, NY, USA, 705–712. https://doi.org/10.1145/3205455.3205498
  21. TorchVision: PyTorch’s Computer Vision Library. https://github.com/pytorch/vision
  22. H. B. Mann and D. R. Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (March 1947), 50–60. https://doi.org/10.1214/aoms/1177730491
  23. Revisiting the Calibration of Modern Neural Networks. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 15682–15694. https://proceedings.neurips.cc/paper/2021/hash/8420d359404024567b5aefda1231af24-Abstract.html
  24. Insights on Representational Similarity in Neural Networks with Canonical Correlation. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/hash/a7a3d70c6d17a73140918996d03c014f-Abstract.html
  25. Automated Super-Network Generation for Scalable Neural Architecture Search. In Proceedings of the First International Conference on Automated Machine Learning. PMLR, 5/1–15. https://proceedings.mlr.press/v188/munoz22a.html
  26. Obtaining Well Calibrated Probabilities Using Bayesian Binning. Proceedings of the AAAI Conference on Artificial Intelligence 29, 1 (Feb. 2015). https://doi.org/10.1609/aaai.v29i1.9602
  27. What Is Being Transferred in Transfer Learning?. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 512–523. https://proceedings.neurips.cc/paper/2020/hash/0607f4c705595b911a4f3e7a127b44e0-Abstract.html
  28. Feature Visualization. Distill (2017). https://doi.org/10.23915/distill.00007
  29. Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (Oct. 2010), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
  30. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html
  31. Do ImageNet Classifiers Generalize to ImageNet?. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 5389–5400. https://proceedings.mlr.press/v97/recht19a.html
  32. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (Dec. 2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
  33. Combinations of Genetic Algorithms and Neural Networks: A Survey of the State of the Art. In [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks. 1–37. https://doi.org/10.1109/COGANN.1992.273950
  34. Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation 10, 2 (June 2002), 99–127. https://doi.org/10.1162/106365602320169811
  35. ZipIt! Merging Models from Different Tasks without Training. https://doi.org/10.48550/arXiv.2305.03053 arXiv:2305.03053 [cs]
  36. Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 6105–6114. https://proceedings.mlr.press/v97/tan19a.html
  37. D. Thierens. 1996. Non-Redundant Genetic Coding of Neural Networks. In Proceedings of IEEE International Conference on Evolutionary Computation. 571–575. https://doi.org/10.1109/ICEC.1996.542662
  38. Thomas Uriot and Dario Izzo. 2020. Safe Crossover of Neural Networks Through Neuron Alignment. Proceedings of the 2020 Genetic and Evolutionary Computation Conference (June 2020), 435–443. https://doi.org/10.1145/3377930.3390197 arXiv:2003.10306
  39. Ross Wightman. 2019. PyTorch Image Models. https://doi.org/10.5281/zenodo.4414861
  40. Generalized Shape Metrics on Neural Representations. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 4738–4750. https://proceedings.neurips.cc/paper/2021/hash/252a3dbaeb32e7690242ad3b556e626b-Abstract.html
  41. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500. https://openaccess.thecvf.com/content_cvpr_2017/html/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.html
  42. Qingfu Zhang and Hui Li. 2007. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Transactions on Evolutionary Computation 11, 6 (Dec. 2007), 712–731. https://doi.org/10.1109/TEVC.2007.892759
Citations (1)

Summary

We haven't generated a summary for this paper yet.