k-Winners-Take-All Ensemble Neural Network (2401.02092v1)
Abstract: Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.
- Breiman, Leo. “Stacked regressions.” Machine Learning 24.1 (1996): 49-64.
- Freund, Yoav, and Robert E. Schapire. “Experiments with a New Boosting Algorithm.” ICML. Vol. 96. 1996.
- Hansen, Lars Kai, and Peter Salamon. “Neural network ensembles.” IEEE Transactions on Pattern Analysis and Machine Intelligence 12.10 (1990): 993-1001.
- Jordan, Michael I., and Robert A. Jacobs. “Hierarchies of adaptive experts.” Advances in Neural Information Processing Systems. 1992.
- LeCun, Yann. “The MNIST database of handwritten digits.” http://yann.lecun.com/exdb/mnist/ (1998).
- Liu, Yong, and Xin Yao. “A cooperative ensemble learning system.” 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227). Vol. 3. IEEE, 1998.
- Majani, E., Ruth Erlanson, and Yaser Abu-Mostafa. “On the K-winners-take-all network.” (1989): 634-642.
- Qian, Ning. “On the momentum term in gradient descent learning algorithms.” Neural Networks 12.1 (1999): 145-151.
- Rumelhart, David E., and David Zipser. “Feature Discovery by Competitive Learning.” Cognitive Science 9.1 (1985): 75-112.
- Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. “Learning representations by back-propagating errors.” nature 323.6088 (1986): 533-536.
- Schapire, Robert E. “The strength of weak learnability.” Machine learning 5.2 (1990): 197-227.
- Wolberg, William H., W. Nick Street, and Olvi L. Mangasarian. “Breast cancer Wisconsin (diagnostic) data set.” UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/] (1992).
- Xiao, Han, Kashif Rasul, and Roland Vollgraf. “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.” arXiv preprint arXiv:1708.07747 (2017).