Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

k-Winners-Take-All Ensemble Neural Network (2401.02092v1)

Published 4 Jan 2024 in cs.NE and cs.AI

Abstract: Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Breiman, Leo. “Stacked regressions.” Machine Learning 24.1 (1996): 49-64.
  2. Freund, Yoav, and Robert E. Schapire. “Experiments with a New Boosting Algorithm.” ICML. Vol. 96. 1996.
  3. Hansen, Lars Kai, and Peter Salamon. “Neural network ensembles.” IEEE Transactions on Pattern Analysis and Machine Intelligence 12.10 (1990): 993-1001.
  4. Jordan, Michael I., and Robert A. Jacobs. “Hierarchies of adaptive experts.” Advances in Neural Information Processing Systems. 1992.
  5. LeCun, Yann. “The MNIST database of handwritten digits.” http://yann.lecun.com/exdb/mnist/ (1998).
  6. Liu, Yong, and Xin Yao. “A cooperative ensemble learning system.” 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227). Vol. 3. IEEE, 1998.
  7. Majani, E., Ruth Erlanson, and Yaser Abu-Mostafa. “On the K-winners-take-all network.” (1989): 634-642.
  8. Qian, Ning. “On the momentum term in gradient descent learning algorithms.” Neural Networks 12.1 (1999): 145-151.
  9. Rumelhart, David E., and David Zipser. “Feature Discovery by Competitive Learning.” Cognitive Science 9.1 (1985): 75-112.
  10. Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. “Learning representations by back-propagating errors.” nature 323.6088 (1986): 533-536.
  11. Schapire, Robert E. “The strength of weak learnability.” Machine learning 5.2 (1990): 197-227.
  12. Wolberg, William H., W. Nick Street, and Olvi L. Mangasarian. “Breast cancer Wisconsin (diagnostic) data set.” UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/] (1992).
  13. Xiao, Han, Kashif Rasul, and Roland Vollgraf. “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.” arXiv preprint arXiv:1708.07747 (2017).
Citations (1)

Summary

We haven't generated a summary for this paper yet.