When do Convolutional Neural Networks Stop Learning? (2403.02473v1)
Abstract: Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in computer vision tasks such as image classification, detection, segmentation, and medical image analysis. In general, an arbitrary number of epochs is used to train such neural networks. In a single epoch, the entire training data -- divided by batch size -- are fed to the network. In practice, validation error with training loss is used to estimate the neural network's generalization, which indicates the optimal learning capacity of the network. Current practice is to stop training when the training loss decreases and the gap between training and validation error increases (i.e., the generalization gap) to avoid overfitting. However, this is a trial-and-error-based approach which raises a critical question: Is it possible to estimate when neural networks stop learning based on training data? This research work introduces a hypothesis that analyzes the data variation across all the layers of a CNN variant to anticipate its near-optimal learning capacity. In the training phase, we use our hypothesis to anticipate the near-optimal learning capacity of a CNN variant without using any validation data. Our hypothesis can be deployed as a plug-and-play to any existing CNN variant without introducing additional trainable parameters to the network. We test our hypothesis on six different CNN variants and three different general image datasets (CIFAR10, CIFAR100, and SVHN). The result based on these CNN variants and datasets shows that our hypothesis saves 58.49\% of computational time (on average) in training. We further conduct our hypothesis on ten medical image datasets and compared with the MedMNIST-V2 benchmark. Based on our experimental result, we save $\approx$ 44.1\% of computational time without losing accuracy against the MedMNIST-V2 benchmark.
- Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Simonyan and Zisserman [2014] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Belkin et al. [2019] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Sinha et al. [2020] Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Sinha, S., Garg, A., Larochelle, H.: Curriculum by smoothing. Advances in Neural Information Processing Systems 33 (2020) Long et al. [2015] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Redmon et al. [2016] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Yadav and Jadhav [2019] Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. Journal of Big Data 6(1), 1–18 (2019) Altaf et al. [2019] Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Altaf, F., Islam, S.M., Akhtar, N., Janjua, N.K.: Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access 7, 99540–99572 (2019) Barata et al. [2013] Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE systems Journal 8(3), 965–979 (2013) Riaz et al. [2015] Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Riaz, F., Hassan, A., Nisar, R., Dinis-Ribeiro, M., Coimbra, M.T.: Content-adaptive region-based color texture descriptors for medical images. IEEE journal of biomedical and health informatics 21(1), 162–171 (2015) Raj et al. [2020] Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Raj, R.J.S., Shobana, S.J., Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Shankar, K.: Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access 8, 58006–58017 (2020) Zhang and He [2020] Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Zhang, M., He, Y.: Accelerating training of transformer-based language models with progressive layer dropping. In: NeurIPS (2020) Tajbakhsh et al. [2016] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging 35(5), 1299–1312 (2016) Piergiovanni and Ryoo [2020] Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Piergiovanni, A., Ryoo, M.S.: Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems (2020) Peng et al. [2020] Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Peng, D., Dong, X., Real, E., Tan, M., Lu, Y., Bender, G., Liu, H., Kraft, A., Liang, C., Le, Q.: Pyglove: Symbolic programming for automated machine learning. Advances in Neural Information Processing Systems 33 (2020) Khalifa and Islam [2022] Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Khalifa, M., Islam, A.: Will your forthcoming book be successful? predicting book success with cnn and readability scores. In: 55th Hawaii International Conference on System ScienceKarevs (HICSS 2022) (2022) Li et al. [2020] Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., Zhang, T.: Residual distillation: Towards portable deep neural networks without shortcuts. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 8935–8946 (2020) Reddy et al. [2020] Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Reddy, M.V., Banburski, A., Pant, N., Poggio, T.: Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems 33 (2020) Kim et al. [2020] Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Kim, W., Kim, S., Park, M., Jeon, G.: Neuron merging: Compensating for pruned neurons. Advances in Neural Information Processing Systems 33 (2020) Dong et al. [2020] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: 34th Conference on Neural Information Processing Systems (2020) Liu et al. [2020] Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Liu, R., Wu, T., Mozafari, B.: Adam with bandit sampling for deep learning. Advances in Neural Information Processing Systems (2020) Huang et al. [2020] Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Huang, Q., He, H., Singh, A., Zhang, Y., Lim, S.-N., Benson, A.: Better set representations for relational reasoning. Advances in Neural Information Processing Systems (2020) Curry et al. [2020] Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Curry, M., Chiang, P.-Y., Goldstein, T., Dickerson, J.: Certifying strategyproof auction networks. Advances in Neural Information Processing Systems 33 (2020) Yang et al. [2021] Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795 (2021) Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, ??? (2016). http://www.deeplearningbook.org Duvenaud et al. [2016] Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Duvenaud, D., Maclaurin, D., Adams, R.: Early stopping as nonparametric variational inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016). PMLR Mahsereci et al. [2017] Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Mahsereci, M., Balles, L., Lassner, C., Hennig, P.: Early stopping without a validation set. arXiv preprint arXiv:1703.09580 (2017) Bonet et al. [2021] Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Bonet, D., Ortega, A., Ruiz-Hidalgo, J., Shekkizhar, S.: Channel-wise early stopping without a validation set via nnk polytope interpolation. arXiv preprint arXiv:2107.12972 (2021) Chollet [2017] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017) Howard et al. [2017] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) Zhang et al. [2018] Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Huang et al. [2018] Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Huang, G., Liu, S., Maaten, L., Weinberger, K.Q.: Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018) Ma et al. [2018] Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018) LeCun et al. [1998] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) Boureau et al. [2010] Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 111–118 (2010) Nair and Hinton [2010] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Icml (2010) Krizhevsky et al. [2009] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Netzer et al. [2011] Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) Novak et al. [2018] Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW
- Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HJC2SzZCW