Indirectly Parameterized Concrete Autoencoders (2403.00563v2)
Abstract: Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
- On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1):237–260, December 1998. ISSN 0304-3975. doi: 10.1016/S0304-3975(97)00115-1. URL https://www.sciencedirect.com/science/article/pii/S0304397597001151.
- A public domain dataset for human activity recognition using smartphones. In Esann, volume 3, pp. 3, 2013.
- Concrete Autoencoders: Differentiable Feature Selection and Reconstruction. In Proceedings of the 36th International Conference on Machine Learning, pp. 444–453. PMLR, May 2019. URL https://proceedings.mlr.press/v97/balin19a.html. ISSN: 2640-3498.
- Feature selection in machine learning: A new perspective. Neurocomputing, 300:70–79, 2018.
- ISOLET. UCI Machine Learning Repository, 1994. DOI: https://doi.org/10.24432/C51G69.
- High dimensional data regression using lasso model and neural networks with random weights. Information Sciences, 372:505–517, 2016. ISSN 0020-0255. doi: https://doi.org/10.1016/j.ins.2016.08.060. URL https://www.sciencedirect.com/science/article/pii/S0020025516306314.
- Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Generalized jensen-shannon divergence loss for learning with noisy labels. Advances in Neural Information Processing Systems, 34:30284–30297, 2021.
- Gumbel, E. J. The Maxima of the Mean Largest Value and of the Range. The Annals of Mathematical Statistics, 25(1):76–84, 1954. ISSN 0003-4851.
- An introduction to variable and feature selection. J. Mach. Learn. Res., 3(null):1157–1182, mar 2003. ISSN 1532-4435.
- Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2019.
- Mice Protein Expression. UCI Machine Learning Repository, 2015a. DOI: https://doi.org/10.24432/C50S3Z.
- Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PLoS One, 10(6):e0129126, June 2015b.
- Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization, June 2021.
- Categorical Reparameterization with Gumbel-Softmax, August 2017. URL http://arxiv.org/abs/1611.01144. arXiv:1611.01144 [cs, stat].
- Multiple importance sampling elbo and deep ensembles of variational approximations. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp. 10687–10702. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/kviman22a.html.
- Cooperation in the latent space: The benefits of adding mixture components in variational autoencoders. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 18008–18022. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/kviman23a.html.
- LassoNet: Neural Networks with Feature Sparsity. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pp. 10–18. PMLR, March 2021. URL https://proceedings.mlr.press/v130/lemhadri21a.html. ISSN: 2640-3498.
- Feature Selection: A Data Perspective. ACM Computing Surveys, 50(6):94:1–94:45, December 2017. ISSN 0360-0300. doi: 10.1145/3136625. URL https://dl.acm.org/doi/10.1145/3136625.
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, March 2017. URL http://arxiv.org/abs/1611.00712. arXiv:1611.00712 [cs, stat].
- Columbia object image library (coil-20), 1996.
- End-to-end learnable eeg channel selection for deep neural networks with gumbel-softmax, 2021.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Feature Selection using Stochastic Gates. In Proceedings of the 37th International Conference on Machine Learning, pp. 10648–10659. PMLR, November 2020. URL https://proceedings.mlr.press/v119/yamada20a.html. ISSN: 2640-3498.